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A COMPARATIVE FACTORIAL ANALYSIS OF JOB SEMANTIC 
STRUCTURES OF MANAGERS AND WORKERS’ 


HARRY C. TRIANDIS 


University of Illinois 


Osgood’s development of a method for the 
measurement of connotative meaning and his 
subsequent factor analyses (Osgood & Suci, 
1955: Osgood, Suci, & Tannenbaum, 1957) 
suggested procedures for the study of the 
relation between language structure and cog- 
nitive processes. Replications of this work in 
other cultures (Kumata, 1958; Kumata & 
Schramm, 1956; Triandis & Osgood, 1958) 
suggested that the original Osgood and Suci 
(1955) findings have considerable generality. 

The method used in the factor analyses of 
meaning can be briefly summarized as fol- 
lows: About 100 Ss are asked to judge 20 to 
30 concepts (depending on the study) on 30 
to 76 descriptive, bipolar scales. The matrix 
of intercorrelations is factored and typically 
three factors emerge. The first factor has been 
consistently identified as an evaluative fac- 
tor; it includes scales such as hightlow, im- 
portant-unimportant, and good—bad, and 
typically accounts for 35 to 45% of the 
common variance. The second factor has been 
identified with potency, and is highly loaded 
on hard-soft, strong-weak, and _ severe— 
lenient; it typically accounts for 15% of the 
common variance. A third factor, usually 
called the activity factor, was identified in 
these studies; it includes active—passive and 

1The data collection was supported by a grant 
from the Foundation for Research on Human Be 
havior. The analysis was made possible through the 
facilities of the Institute of Communications Re- 
search and Computer Laboratory of the University 
of Illinois. I am grateful to Edward E. Ware and 
his staff for processing of the data through the 
Computer Laboratory, and to L. G. Humphreys and 
C. E. Osgood for extremely valuable critical com- 
ments on an earlier version of this paper. 


excitable-calm and accounts for about 10° 
of the common variance. In some of the 
studies, for instance the Triandis and Osgood 
(1958) study, the potency and activity fac- 
tors appeared fused and accounted for almost 
20% of the common variance. 

The most extensive of these factor analytic 
studies is the Osgood, Suci, and Tannenbaum 
(1957, pp. 47-64) study in which five 
additional factors were identified: stability 
(sober—drunk, stable—unstable), tautness (an- 
gular—rounded), novelty recep- 
tivity (colorful—colorless) and aggressiveness 
(aggressive—defensive ). 


(new-—old), 


In the factor analyses mentioned above the 
concepts that were judged were a sort of 
random sample of the concepts used in a 
language (e.g., foreigner, modern art, knife, 
boulder, sin, snow, time, me, birth hospital, 
America). In other words, the complete 
domain of meaning was sampled, even if 
sketchily. In the most thorough of these 
studies (Osgood, et al., 1957) the scales were 
selected by sampling from Roget’s Thesaurus. 
Thus, the scales were also selected to sample 
a relatively large domain of meaning. In the 
present study both the concepts and the scales 
were restricted to a specific domain of mean- 
ing, namely jobs. The following questions 
were investigated: (a) Does the restriction of 
the domain have an effect on the factorial 
structure? (5) Is there a difference in the 
factorial structure when two populations are 
used that might logically be expected to be 
different on the particular domain of mean- 


ing? 
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METHOD 


Selection of scales. Twenty-five managers and 20 
workers were presented with triads of various jobs, 
such as “your present job,” “your previous job,” 
“a job you would like to have,” “job that is very 
interesting,” etc. For details see Triandis (1959a). 
The Ss were asked “Which one of these jobs is more 
different from the other two?” “Which: characteristic 
makes it more different?” and “What is the oppo- 
site of this characteristic?” For example, the triad 
teacher, welder, and clerk may provide the scale 
manual—white collar (job). The lists of character- 
istics obtained are discussed in Triandis (1959a). A 
stratified random sample of the characteristics so 
listed constituted 28 scales for the semantic differ- 
ential used in the present study. An additional 10 
scales were taken from Osgood et al. (1957) so as 
to represent seven of the eight factors of their analy- 
sis. We did not include scales for the tautness fac- 
tor, because it does not obviously apply to jobs 
Triandis (1959c) has shown that it is unwise to ask 
workers to judge concepts against scales that appear 
irrelevant to them. The actual semantic differential 
and the instructions may be found in Triandis 
(1958, pp. 296-298). 

Selection of concepts. We chose five jobs that 
were representative of the jobs with which our Ss 
have had some experience. The five jobs were: 
Welder, Teacher, Vice President, Personnel Director, 
and Clerk. 

Procedure. The 38-scale semantic differential was 
administered to 47 managers and clerks with super- 
visory responsibility and 56 workers (for details of 
the sample and the reliabilities see Triandis, 1959a, 
1959c). The order of the jobs rated was counter- 
balanced. Osgood’s et al. (1957, p. 81, Form II) 
procedure was used. The 235 judgments obtained 
from the managers and the 280 judgments ob- 
tained from the workers, on each of the 38 scales, 
were intercorrelated so that two 38 X 38 matrices 
of intercorrelations were computed. Two centroid 
(Thurstone, 1947) factor analyses, with fixed com- 
munalities, were performed—one on the manager 
and the other on the worker matrices. Fifteen fac- 
tors were extracted and the six factors that collec- 
tively accounted for more than 50% of the variance 
were rotated by the verimax method (Kaiser, 1958). 
This rotation maximizes the variance of the squared 
loadings of a factor and leads to both an approxi- 
mation of simple structure and factorial invariance 


RESULTS 
Over-all similarity of matrices. Fifty cell 
entries from the matrix of intercorrelations 


obtained from the management group were 
selected at random and were correlated with 
the 50 corresponding entries of the worker 
matrix. The Spearman—Brown corrected Pear- 
son r was .98, for the total matrices (703 
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entries). Thus, the two matrices are very 
similar. 

The factors. Table 1 shows the six rotated 
factors for each of the groups. A correlational 
comparison of the factors shows that Factor I 
of the management group is most similar to 
Factor I of the worker group (r= .88), 
though it is also somewhat similar to the fifth 
worker factor (r = .34). The remaining fac- 
tors also correspond rather well, except for 
the sixth factor. 

The first factor, in both samples, will be 
called an “objective job evaluation” factor. 
It has high loadings on requires experience, 
good, important, involves diagnosing trouble, 
involves much responsibility, professional, 
creative, skilled, manager, requires much edu- 
cation, requires very much intelligence, and 
very much training. It could also be called a 
“job complexity factor.” In addition to the 
above scales that have high loadings on this 
factor, we find that the management sample 
analysis shows high loadings on desirable, 
no routine, hard, high position, sociable, high 
pay, aggressive, difficult, executive, interest- 
ing, and challenging. This suggests that the 
objective job evaluation or job complexity 
factor is more evaluative for the managers 
than for the workers. Job complexity is clearly 
more desirable and more “good”’ for the man- 
agers. This suggests that the semantic differ- 
ential might conceivably be used, as a selec- 
tion test for executives to tap the extent to 
which a particular person values jobs that 
are difficult, complex, yet interesting and 
challenging. 

The second factor, in both samples, is a 
white collar factor. It is highly loaded on 
indoors, soft, clean, sociable, office work, pas- 
sive, executive, sales-minded, light, manager, 
and sitting. It has loadings on the heavy 
light and hard-soft scales, like Osgood’s po- 
tency factor, but the loadings have opposite 
signs, so that it is a potency factor only if 
the signs are reversed in which case it be- 
comes an outdoor—hard—dirty—unsociable—fac- 
tory work—active—heavy—standing—worker or 
“physical potency”’ factor. 

The third factor is a variety factor. It is 
highly loaded on changeable, involves diag- 
nosing trouble, new, executive, creative, and 
travels around, for both groups, and in the 
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case of the management sample it also has 
high loadings on variable, outdoors, 
many things, interesting, and challenging. 

The fourth factor is a job level factor. It 
is loaded on desirable, variable, high position, 
steady, high pay, difficult, manager, doing 
many things. 

The fifth factor is a “subjective job evalua- 
tion” factor. It has high loadings on desir- 
able, good, important, responsible, alert, ac- 
tive, ingenious, creative, interesting, and chal- 
lenging. 

Finally, the sixth management factor is a 
kind of dynamism factor, that is, a fusion of 
Osgood’s potency and activity factors. It has 
high loadings on variable, hard, and active. 
It was previously identified in a number of 
studies (e.g., Triandis & Osgood, 1958). The 
sixth worker factor is uninterpretable. 


does 


DISCUSSION 


The first question to which the present 
study was directed is whether restriction of 
the domain of meaning has an effect on the 
factor structure. Clearly it does. If we con- 
sider the good—bad and desirable—undesirabl 
scales, which represent the “pure evaluation” 
factor isolated in studies of the broad domain 
of meaning, we find that they have only mod- 
erately high, and approximately equal load- 
ings on the first and fourth management 
factors and the first and fifth worker factors. 
This means that the pure evaluation factor 
dichotomizes the angle between the objective 
job evaluation and the job level factors of 
the managers, and the objective job evalua- 
tion and subjective job evaluation factors of 
the workers. In other words, looking at it 
from the opposite direction of causation, the 
managers evaluate jobs in terms of job com- 
plexity and job level, the workers in terms 
of job complexity and subjective evaluation 
(desirable, good, important, professional, in- 


teresting, challenging, etc.). Such an inter- 


pretation is also consistent with the analyses 


of the ‘categories of thought” of the man- 
agers and workers (Triandis, 1959a). 

Thus, we have in the present analysis a 
shift in emphasis from the broad factors of 
evaluation, potency, and activity found in the 
factor analyses of the broad domain of mean- 
ing, to a set of factors that is more appro- 
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priate for the subject matter under considera- 
tion, namely, jobs 
have objective and subjective job evaluation 
factors. Instead of potency and activity we 
have a fusion of the two in a relatively insig- 
nificant dynamism factor. New factors, such 
as the white collar, variety, and job level 
factors, that are specific to the job domain 
of meaning, have taken the place of the 
potency and activity factors and account for 
a portion of the variance that was previously 
accounted by these factors. 

The second question was whether there is 
a difference in the factorial structure of the 
two populations (managers and workers). 
Managment and workers, one might expect, 
will differ in their factorial structures when 


Instead of evaluation we 


the domain of jobs is considered. The answer 
is that this is not true. The factor structures 
are surprisingly similar. The correlation of 
the matrices of intercorrelation suggests that 
we are dealing with essentially the same way 
of looking at the job domain. Only in the 
details of emphasis is it possible to find dif- 
ferences between managers and workers. The 
differences of emphasis are seen upon exami- 
nation of the loadings of the first factor, 
which suggest that job complexity, interest, 
and challenge are a greater “good” for the 
managers than for the workers. Similarly, ex- 
amination of the fourth factor indicates that 
low position is clearly the 
managers, 


undesirable for 
but not for the workers. 

Thus, in summary, there are no differences 
in the factorial structure of the job domain 
of managers and workers in the present study, 
though the correspondence between the fac- 
tors is only approximate because there are 
some important differences in emphasis that 
do change the character of the corresponding 
factors. 

The present study also throws more light 
upon the problem of concept-scale interaction 
in the semantic differential (Osgood, et al., 
1957, pp. 176-188). Our evidence is consist- 
ent with the generalization made by these 
authors that “in the process of human judg- 
ment, all scales tend to shift in meaning 
toward parallelism with the dominant (char- 
acteristic) attribute of the concept being 
judged.” In our case, it appears, job com- 
plexity and job level are important attributes 
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of jobs and the evaluative factor dichotomizes 
the angle of these two factors. The content, 
emphasis, and importance of the factors ex- 
tracted from a semantic differential appear 
to be dependent on the domain of meaning 
under consideration. This is also clear from 
other studies. For instance, Tucker (1955) 
used the semantic differential with nonartists 
who judged representational paintings, Solo- 
mon (1954) used sonar sounds as his stimuli, 
Hoffstatter (1959, pp. 258-259) used the cen- 
tral concepts of the philosophic 
Empedocles, and Morris, Osgood, and Ware 

and Osgood (1952) did a factor analysis of 
the semantic differential judgments of the 
ways of life (values) of Morris; in all these 
studies there was much evidence of concept 

scale interaction. In the first study, for in- 
stance, wet-dry had a loading of —.89 on the 
evaluative factor, while the loading in factor 
analyses of the broad domain of meaning was 
negligible (the loading in Osgood’s Thesauru 
analysis was .08). 

The 
query whether the same factors will emerge 
in another study 
other samples of the industrial population 
Koene 
that 
at least the first two factors will be identified 
in such a study. Koene attempted a replica- 
tion and extension of certain aspects of the 
Triandis (1958) study. In an exploratory 
study, with only seven Ss, and using the job 
version of Kelly’s Repertory Test, that is de 
scribed under “Method” above. he obtained 
three factors, after a Kelly-type factor analy- 
sis (Kelly, The Ss in 
Koene’s sample were coke oven employees; 
they judged specific jobs from their own shop. 
The first factor was a job complexity factor 


system of 


present study does not answer the 


of the job domain, with 


Some work recently completed by 


with Dutch workers suggests, however, 


1955, p. 277 ff.). 


and included the following job categories: 
skilled worker, highly trained maintenance 
worker, “feeling” for machines, 


had 


drawings, 


possesses a 


understands theory, has construction 


training, can read specialist, not 
Osgood, C. E., and 
connotalive 


Ware, E. E 


Variety 


* Morris, C 
Analysis of the 
values as expressed by 
tudents. Mimeographed 

G. B. Koenc Pr 
195! 


meanings Ol a 
American 
1959 

rsonal communication, May 


of human college 


manuscript, 
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unskilled laborer. Comparison of Koene’s and 
the present writer’s first factor indicates that 
they are really the same. The second of 
Koene’s factors included the following: is 
leader, administrator, makes out requisitions, 
does not punch in, gives orders, has clerical 
training, sitting (not standing), does mental 
work. Again there is very clear correspond- 
ence with the second factor of the present 
study. Koene called it a leadership factor. 
Perhaps the best name is white collar factor. 
It in 
cluded categories such as “he co-operates on 
pipefitting,” “works between the rails,” etc. 

The fact that the first two factors of the 
present study were replicated with a differ- 
ent sample of Ss, different testing 
method: and procedure, a sample of jobs of 
a more restricted domain than the present 
study’s domain, a different language, and a 
different factor analytic procedure, suggests 


Koene’s third factor was very specific. 


using a 


that the first two factors will appear in future 
cross-validations. 

In summary, the suggests 
that structure that is 
applicable to unrestricted domains of mean- 
this structure 
particular domain. 
Groups of Ss that for various reasons per 
ceive the particular domain differentially tend 
to have somewhat 


present study 


there is a semantic 
ing. In the restricted domains 
is modified to fit the 


different semantic struc- 


tures. These deviations from the general pat- 


tern are probably even more accentuated at 
the individual level, where substantial differ- 
ences in semantic structure are likely to occur. 
These differences are of some importance. For 
instance, Triandis (1959b) has that 
the greater the difference in the semantic 
structures between supervisors and subordi- 


shown 


nates the less effective is the communication 
between the two, and the smaller is the inter- 
personal attraction between the two Ss. 


SUMMARY 


Osgood’s factor analysis of meaning was 
replicated with a more restricted domain of 
meaning, namely, jobs. Forty-seven managers 
and 56 workers completed a 38-scale seman 
tic differential with five jobs. The scales of 
the differential were selected by sampling list: 
of attributes of jobs obtained from 25 man 
agers and 20 workers, with an adaptation of 
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Kelly’s Repertory Test. A centroid factor 
analysis extracted 15 factors, and 6 of these 
were rotated by the verimax method. A num- 
ber of differences were found between the 
factors obtained by this procedure and the 
factors identified by Osgood and his co-work- 
ers. There were also certain differences be- 
tween the factors obtained from the managers 
and those obtained from the workers. 
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DEFENSE MECHANISM TEST: 


A NEW METHOD FOR DIAGNOSIS AND PERSONNEL SELECTION 


ULF KRAGH 


Institute of Military Psychology, Stockholm, Sweden 


In this article, the outlines of an analysis of 
precognitive “defensive” organization for di- 
agnostic and selection aims will be given. The 
method is preliminarily termed the Defense 
Mechanism Test (DMT). It is administered 
as a group test; the techniques consist roughly 
in the repeated subliminal exposure of a TAT- 
like picture, exposure time being increased by 
steps. At each exposure, the Ss make a sketchy 
drawing (or a marking) of what they have 
seen, and write a short comment. 

During the last decade, the problem of 
“perceptual defense” has attracted much in- 
terest. But even though there is affinity of 
approach, decisive differences also exist be- 
tween experiments on perceptual defense and 
the analysis of precognitive defensive organ- 
ization, and the latter has not grown out of 
the former (Kragh, 1960b). 

The experiment is related to three basic 
conceptual frames: the actual-genetic or “de- 
velopmental” model of perception and person- 
ality, the psychoanalytic theory of defense 
mechanisms, and the theory and practice of 
projective techniques. 

The actual-genetic’ model of perception- 
personality forms a comprehensive conceptual 
frame (Kragh, 1955, 1959, 1960a, 1960b; 
Smith, 1957b), which has been elaborated in 
close interaction with experiments on pre- 
cognitive organization and the development 
of percepts (Smith, 1956, 1957a, mimeo.; 
Smith & Henriksson, 1956; Smith & Nyman, 
1959). The method presented in this article 
does not, however, necessarily presuppose the 
acceptance of the model as a whole (including 
the notions of a successive determination of 
links in fractioned precognition, correspond- 
ence between micro- and macrogenesis, etc.). 

According to S. Freud (1949), the dynam- 
ics of defense mechanisms roughly comprises 
three links: threat/danger, anxiety, and de 


(Aktualgenese) 


1] he 
coined by F 


“actualgenesis” was 


Sander. 


term 


fense against anxiety. In the experimental 
procedure, these three links correspond with 
subliminal threat, manifest or submanifest 
anxiety, and precognitive defensive organiza- 
tion. The relations between the analysis of 
precognitive defensive organization on one 
hand, and psychoanalytic defense mechanisms 
on the other hand have been discussed in 
other contexts (Kragh, 1955, 1959). 

The DMT may be classified as a projective 
method because “objective cues” are min- 
imized and the reference to the “perceiver” 
maximized, “projection” of needs (etc.) tak- 
ing place, and because perception is said to 
“mirror” the entire personality. Identification 
with a “hero” is supposed to occur in the 
same way as in the TAT. 

There are, however, some essential differ- 
ences between current projective methods and 
the DMT: 


1. The DMT is purposely constructed for the diag 
nosis of defense mechanisms. It also other 
types of analysis, say of whole responses, movement 
responses, needs, etc., but this is no primary aim of 
the interpretation. 

2. Due to reduction of stimulus intensity by utiliz 
ing tachistoscopic exposure, early precognitive levels 
are likely to become more easily activated in the 
DMT than in other projective techniques. In a series 
of tachistoscopic exposures of a TAT picture, the 
“ordinary” TAT thus corresponds to the last expo- 
sure of the picture (or rather, “full-time” exposure) 
In spite of the weak stimulus, there are the same 
possibilities of referring responses to stimulus as in 
the ordinary TAT. 

3. In the DMT, stimulus intensity may be in- 
creased by steps, effecting an ordered sequence of 
precognitive levels in the direction, earlier levels 
later (more stimulus-adequate) levels. The test 
informs us about the level of a particular precogni- 
tive organization, and permits a strict sequence anal- 
Current projective techniques, on the other 
hand, deal with precognitive material activated in 
random order, and it is difficult, if not impossible, to 
obtain an adequate analysis of precognitive levels 
and sequences. 

4. The operational character of the conceptual 
frame thus seems to be more pronounced in the 
DMT than in other projective methods. As 


allows 


ysis. 


a conse- 
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quence, there would be no such great need for inter- 
preting data by reference to “extrinsic” frames (e.z., 
concepts like “abilities,” “traits,” “needs”). 

5. The DMT has 
particularly suited 
persons, and 


worked out so as to be 
for the diagnosis also of normal 
notably for personnel selection. The 
assumption was made that there would be no sharp 
borderlines between normal and abnormal Ss as re- 
gards the quality and the quantity of defense mecha- 
nisms; the psychopathology of precognitive defensive 
organization has, however, studied only in 
rather modest samples (Kragh, 1955; Smith, mimeo; 
Smith & Nyman, 1959) 


been 


been 


It seems plausible to assume that persons 
in whom defense mechanisms are constantly 
at work, should also have difficulties in re- 
sisting the stress of, e.g., aviation training. 
This would be due in part to the successive 
reduction of energy resources in connection 
with (manifest or submanifest) anxiety upon 
which defense mechanisms are superimposed. 
Anxiety would be induced by hearing of cas- 
ualties within the Air Force, by S’s own 
casualties, and by those of his comrades. Ss 
with strong defense mechanisms would also 
prove accident-prone during aerobatic flying, 
due to deficiencies of reality-testing, and thus 
eventually fail in the training. In an analogous 
way, it was assumed that the threat in the 
test picture would activate (manifest or sub- 
manifest) anxiety superimposed by defensive 
organization in neurotic Ss, resulting in stim- 
ulus-inadequate response. In persons without 
marked defense mechanisms, anxiety would 
perhaps become activated, but the develop- 
ment of their percepts would proceed towards 
correct recognition without intervening de- 
fensive organization. 

It is a commonly accepted notion that de- 
fense mechanisms disturb reality-testing either 
in a general or in a more specific way. 
Granted a thoroughgoing correspondence be- 
tween the defense mechanisms and the pre- 
cognitive defensive organization, the DMT 
would also be suited for diagnostic and per- 
sonnel selection aims other than those treated 
of here. 


PROCEDURE AND DEFINITIONS 


The procedure has been described in detail 
elsewhere (Kragh, 1959). 


with a 
exposures of each pic 
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camera shutter. There are 12 
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ture, in msecs.: 20, 40, 40, 100, 100, 200, 200, 200, 
500, 500, 500, 500 

The two stimuli employed here are the 
ture Number 1 (boy with a violin), the head and 
shoulders of a threatening and ugly male person 
having been inserted at the right above the boy, and 
a “parallel” picture with a young man centrally 
placed and an old, ugly-looking man above him. The 
experiments are conducted with groups of not more 
than about 40 Ss. The first row of Ss is placed at the 
distance of about 6 m. from the screen. 

The Ss are instructed to make a drawing of what 
they have seen on the screen without paying atten- 
tion to whether their impression is correct or not 
For each exposure they have to make a drawing in a 
square. If they feel unable to make any kind of 
drawing, they are also allowed to make markings 
instead of a drawing. In addition to the drawings 
(markings) a written comment is also required so 
as to make quite clear what each S has seen, and at 
what place on the screen. The Ss are told to work as 
fast as they can without neglecting the thoroughness 
of description. The time interval between the ex- 
posures is determined by the time required by the 
slowest S for reporting. It has rarely been more than 
about 3 min. The exposure times and the illumina- 
tion of the room have been devised so as to “catch” 
as much of the precognitive development of each S 
as possible. In order not to make the picture motif 
too known it has, however, proved necessary to “cut 
off” the very end of the development in most of the 
Ss. This has been effected by utilizing exposure times 

1 sec., together with “normal” illumination of the 
room 

In all 412 aviation cadets took part in the experi- 
ments; their age varies between 17 and 22 years 
They had all been subjected to a preliminary screen- 
ing procedure, medical and psychological. None has 
an IQ much below the average and visual acuity is 
normal. Their educational 
mentary school and college 


TAT pic- 


between ele 
Swedish Gymnasium) 


level varies 

Definitions: Phase denotes the sum total of 
data registered in connection with one expo- 
sure. In the experiments referred to here, the 
first phase corresponds to the exposure at 
which S has first seen something to which he 
attributes a meaning, and the last phase that 
exposure at which he has seen the picture 
“correctly.” P-phases we term all those phases 
which precede the last phase. Hero denotes 
the person who is seen (drawn, marked) by 
the S at the place of the main person in the 
picture. Secondary denotes the person who is 
seen at the place of the secondary person in 
the picture. At the beginning of many series 
only one person is seen; in such cases this 
person is also termed hero on the condition 
that he is not situated at the place of the 
secondary person in the picture. 
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Typification, Combinations, and “Strength” 
of Precognitive “Defensive” Organization 
The material for the following preliminary 

typification consists of the drawings and re- 

ports of aviation cadets (mainly those who 
failed in primary and basic training) having 
been subjected to two “threat” stimuli. An 
elaborated form of the typification serves as 

a code for the raters’ prognosis of pass and 

fail. The typification refers to the “classi- 

cal” psychoanalytic defense mechanisms (A. 

Freud, 1946). For details, the reader is re- 

ferred to recent publications (Kragh, 1959, 

1960b). 


Repression: The hero or/and the secondary have 
the quality of stiffness, rigidity, lifelessness, or of 
being “disguised,” or is (are) seen as an animal 

Isolation: The hero and the secondary are 
rated from each other; the threat 
cluded altogether; early phases (all 
excluded (isolated) 

Denial: The threat is explicitly denied 

Reaction formation: The threat is 
opposite 

Identification with the 
comes the aggressor 

Turning 


sepa 


(isolated) is ex- 


P-phases) are 
turned into its 
agere 


The 


against the self (intro-aggression 


hero, or the main person’s instrument (in, e.g., TAT 


1: the violin) is damaged or worthless, or the in 
strument is a threat to the hero 

Identification with a 
identifiable as one of the 
isms): The hero 
is female, and the main person’s instrument is seen 


female role not directly 


“classical” defense mechan- 


in the picture, a male main person) 


as a female attribute, or as a round or/and hollow 
object 

The typification of 
ization 


precognitive defensive organ- 


relers to one type of organization in one 


single phase, or in the sequence of two phases 


As regards the strength (negative weight) 
of a particular defensive organization, pre- 
liminary norms have been established which 
will not, however, be discussed in this context. 

In the same way as defense mechanisms are 
likely to combine in the individual case (cf. 
Fenichel, 1945, p. 153), two or more types of 
precognitive defensive organization may also 
combine in one phase or in many successive 
phases. Combinations may occur either in the 
form of “mixed types” (borderline cases), or 
as two or more distinguishable types of or 
ganization in one and the same phase 


Examples of “mixed types” are: the secondary is a 
a painting 


secondary is “a 


lamp, a window, 


tion); the 


(repression and isola 
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drawn in the wrong way” (isolation and denial) ; the 
hero protects the secondary (reaction formation and 
identification with the aggressor). 


Phase Level and Phase Sequence 


The phase level may be defined as exposure 
time or as phase number. The exposure time 
in terms of threshold value has played a 
prominent role within the discussion on “per- 
ceptual defense,” but the results obtained are 
contradictory. It also seems as if the absolute 
thresholds for precognition and for correct 
cognition would be of but little value for 
diagnostic purposes (because largely due to 
physiological factors?). On the other hand it 
would be significant if an S has particularly 
many P-phases (= great phase breadth) be- 
fore attaining the last phase. 

Phase level in terms of phase number is 
important. Two Ss may have the same type 
of organization at, say, the tenth exposure, 
but its significance will be different 
P-phases precede, or only two. 


if nine 


The phase level may be defined in relation to the 
first phase, or to the last phase, or to both. Refer 
ence to the last phase would be important (a) in the 
light of M. I. Stein’s (1949) and G. S. Blum’s (1954) 
findings that defense mechanisms are likely to be- 
come activated at rather long exposure times, and 
b) because it would be natural to think of defensive 
organization close to the last phase (the “level of 
reality”) as particularly grave, and possibly indicat 
ing a thoroughgoing deficiency of reality testing 


In the experiments we are concerned with 
here, each series of phases is divided into 
three phase levels: an early, a medium, and a 
late phase level, with an approximately equal 
number of phases in each phase level. 

It should be stressed that in the strict sense, 
no interpretation of precognitive defensive 
organization should be made without simul- 
taneous consideration of its place in the phase 
sequence. Analysis of phase sequence may in- 
clude two phases or more; a complete analysis 
comprises the whole series, which will prob- 
ably prove indispensable for the clinical use. 
In personnel selection, again, the sequence 
analysis proper may be restricted to two 
phases, and to some 
that of 
ing) correctness of the organization towards 
the end of the series, 


general aspects as, e.g 


diminishing (or conversely, increas 
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Coding and Rating 


For each S, and for each exposure, defen- 
sive organization is coded on the basis of 
drawing (marking) and comments. The code 
refers in first hand to two groups of aviation 
cadets: in autumn 1956 (N = 74) and in 
spring 1957 (N = 77). A trial validation was 
also made for the latter group. 

Two types of codes have been used: a com- 
plete one, where all varieties of defensive or- 
ganization are coded, and an abbreviated one, 
where only the main types are coded. Every 
coded sign is at the same time a (prognost- 
ically) “negative sign.” 


The following directions have been given 
for a 5-degree rating scale: 

5. No negative signs in any of the two series 

4. One weak sign in one of the series 

3. One weak sign in both series on early or medium 
phase level, or one strong sign on early or medium 
phase level in one series, or two weak signs on early 
or medium phase level in one series 

2. One strong sign in two or more phases on late 
phase level in one series, or two corresponding weak 
signs on late phase level in two series 

1. One corresponding sign on late phase level in 
both series, or many signs on late phase level in both 
series, or many strong signs in many phases on late 
phase level 


Ratings 3—5 have been used to denote pass 
prognosis, 1-2 to denote fail prognosis. 

No fixed norm for the distribution of the 
ratings has been used; the raters have, how- 
ever, as a rule made their ratings in accord- 
ance with an expected pass-fail proportion of 
about 50%. There has been one chief rater 
(the author, A) for all the groups, and four 
co-raters (B-E). Raters A, B, and E have 
utilized the complete version of the code, 
Raters C (except for the attack divers) and 
D the abbreviated version. The co-raters’ 
training in coding and rating has lasted 14, 
2, 7, and 7 days for Raters B, C, D, and E, 
respectively. 

Ratings of Swedish aviation cadets and of 
Danish attack divers have been made on the 
basis of DMT protocols exclusively. Testing 
as well as ratings have taken place at the very 
beginning of the primary training, before the 
onset of eliminations. 

Pass-fail criteria” have available for 
two years of aviation cadets (1957-1958): 


been 
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officer, reserve officer, and engineer officer 
(N = 21) candidates with college, and avia- 
tion cadets with only high school or elemen- 
tary school. Demands for passing primary 
and basic aviation training are practically 
identical for all cadets. For one small group 
of Danish attack divers (1958), the criteria 
have consisted in ranked fitness for the serv- 
ice made by the command, for a second group 
(1959) in the (temporal) order of discharge 
from service due to unfitness. 

Coding and rating of the two series of one 
S take about 5-10 min. for a trained rater. 
One rater may accomplish about 30 ratings 
a day, but not easily much more. 


RESULTS AND DISCUSSION 


In the following, coefficients of validity 
(biserial r) and interrater reliability (Pear- 
son r) of the DMT, and intercorrelations 
(Pearson r) between DMT ratings and the 
tests and ratings forming part of the pre- 
liminary screening battery will be given. The 
coefficients have been calculated in accord- 
ance with standard methods. 


In the tables, the number of Ss may be somewhat 
reduced in the computations of coefficients, due to 
lack of data. If more than five Ss are lacking, the 
correct N is given. In Tables 1-3, Co, Crf, Ci = of- 
ficer, reserve officer, engineer officer candidates. Er 
and Ef = aviation cadets with high school and with 
only elementary school. “Intelligence” is an intelli 
gence test (series completion) for the Co, Crf, Ci, 
and Er groups, “Intelligence Ef” is a combined in- 
telligence and for the Ef group 
(arithmetic problems, general information, spelling, 
verbal reasoning, series completion). “Mechanical” is 
a test of mechanical reasoning, “Aviation informa- 
tion” measures the S’s information in different areas 
of aviation. “Attention distribution” and (two-hand) 
“Coordination” are psychomotor tests. The psychol- 
ogists’ (“Psychologist”) ratings are based on these 
(except for the psychomotor tests), some inventories, 
and an interview. The “Board” decides on the pre- 
liminary screening and makes its ratings after having 
been informed of the results of the tests and of the 
psychologist’s report (but not of his rating), and 
after a short interview with the applicant before 
the Board 


achievement test 


The interrater reliabilities are low to satis- 
factory in comparison with ordinary aptitude 
tests, and as good or even better than those 
of ratings based on projective methods. The 
reliability coefficients seem to correlate posi- 
tively with increased training of the rater, and 
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TABLE 1 


VALIDITIES 


( sroups ol 
Aviation 
Cadets 


Pass-Fail 


Proportion Rater 


Ef and Er 48/29 \ 42 


Spring 1957 

Ef and Er 
Autumn 1957 
Ef and Er 
Spring 1958 
Ef Autumn 1958 
Ef Spring 1957 
Autumn 1958 
Er Spring 1957 
Spring 1958 
Co, Crf, and 
Ci Summer 
1957 

Co, Crf, al d 
Ci Summer 
1958 


Danish 
attack divers 
1958 

Danish 
attack divers 
1959 


Y 4 
Na + Nz 


z(} 


with the use of the complete code version. 
The coefficients vary between .57 and .90 
with a mean (for all the raters) of .70. 

The validity is rather high in the case of 
the chief Rater A except for one group (of- 
ficer cadets 1958); it is satisfactory for two 
of the co-raters, B and C, in two groups, and 
unsatisfactory in two, while it is unsatisfac- 
tory in the case of the two remaining co- 
raters. The chief rater’s training exceeds by 
far that of the co-raters, but there is strong 
that any psychologist, if 
properly trained in coding and rating, should 


reason to believe 
be able to attain satisfactory validity; the 
lower limit seems to be at least about 14 days. 
The use of an abbreviated code version should 


AND INTERRATER RELIABILITIES OF 


Prognosticated 
Proportion 


THE DMT 


Validity 


Significance* 
p< 


Interrater 
Reliability (r) 


35 10 


»b Abbreviated code ve 
¢ For only Co anc 
4 Two-tailed test 


certainly be avoided (cf. the low validities of 
Raters C and D). 

The validity coefficient of Rater A increases 
from spring to autumn 1957, but then it sinks 
for the officer candidates (Co, Crf, Ci) in 
1958. Four factors would have contributed to 
this rather astonishing fact. First, a restric- 
tion of range has taken place in this group 
due to depression of the labor market, with a 
greater afflux of applicants to the preliminary 
screening. Second, this screening has probably 
been made more effective by modifications of 
the screening battery, accentuating the re 
striction of range. Third, the percentage of 
those who failed has sunk from 55‘% in 1957 
to 44% in 1958, while the DMT raters’ prog 
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TABLE 2 


VALIDITIES (rpi) OF TESTS AND RATINGS OF THE 


PRELIMINARY SCREENING 


BATTERY OF AVIATION CADETS 


Groups « Intelligence 
Aviation Cadet ntelligence Ef 


Mex 
Kf Spring 1957 

Er Spring 1957 

Ef and Er 

\utumn 1957 

Kf Spring 1958 


kr Spring 1958 


Ef Autumn 1957 
Autumn 1958 
Er Spring 1957 
Spring 1958 


nosticated 
the reverse 


percentages have rather been in 
direction. The teachers at the 
aviation school have reported increased diffi- 
culties of pass or fail in the 
The validities of the other 
ests forming part of the preliminary screen- 
ing battery of the group are all low (little 
validity left after screening?). Finally, the 
DMT ratings of the 1958 group have been 
vitiated by a technical fault of the experi- 
mental procedure (second series defective). 
In two instances (Co, Crf, and Ci 1957, 
and Ef and Er 1958), there is a rather great 
discrepancy between the validity of Rater A 
and that of the co-rater in comparison with 
the interrater reliability. The discrepancy is 
explained by differences of the prognosticated 
pass-fail proportion in the two cases, many 


deciding on 


group of 1958. 


Attention 
distr 


P 


wrong ratings of the co-raters being 
positive ratings. 

As regards the Danish attack divers, the 
samples are small in the case of both raters, 
but the tendency is steady 


“surplus” 


in the positive 
direction. The criteria of the group tested in, 
1958 are better than those of the group tested 
in 1959, 

The between the DMT 
ratings and other tests and ratings applied at 
the preliminary screening are low (Table 3), 
which would increase the significance of the 
test. 

In evaluating coefficients of validity and 
reliability of the DMT, due consideration 
should also be payed to the fact that coding 
and rating (including, 
value of the different signs) are still very far 


intercorrelations 


e.g., the prognostic 


TABLE 3 


INTERCORRELATIONS (r) BETWEEN THE DMT RatIncs 


THE PRELIMINARY SCREENING 


DMT ratings 
for groups of 
Aviation Cadets 


kf Autun 
Autumn 1958 


1 1957 


Er Spring 1957- 
Spring 1958 
Co, Crf, and Ci 


Summer 1957 


AND THE TESTS AND RATINGS OF 


BATTERY OF AVIATION CADETS 


Psychologist Board 


+12 +17 
26 

=70 
02 


20 





The 


from perfection. One may also point to the 
fact that other well-analysed projective meth 
ods seem to have failed in predicting flying 
performance (Holtzman & Sells, 1954) 


SUMMARY 

The Defense Mechanism Test is an experi- 
mental method for clinical diagnosis and for 
personnel selection. It is an instance of the 
actual-genetic techniques, or of the serial 
analysis of precognitive organization. It is 
administered as a group test, and may be 
evaluated within the time span of 5-10 min- 
utes. By repeated tachistoscopic presentation 
of a picture with a central “hero” and a pe- 
ripheral threatening person, reactions are ob- 
tained which are interpreted in terms of pre- 
cognitive ‘‘defensive”’ 
posed upon anxiety. 

In the experiment, it has been possible to 
define preliminarily the “classical” defense 
mechanissm of repression, isolation, denial, 
reaction-formation, identification with the ag- 
gressor, and turning against the self, in terms 
of types of precognitive defensive organiza- 
tion. In the concrete case, however, combina- 
tions of structures, the precognitive level; and 
the total structures to which the 
precognitive defensive organization in ques- 
tion belongs, must taken into con- 
sideration. 

Frequent and strong activity of defense 
mechanisms is supposed to lower the individ- 
ual’s resistance against Groups of 
Swedish cadets in primary and basic aviation 
training, and a small group of Danish attack 
divers, have been chosen to test this as- 
sumption, as well as the assumption of corre- 
spondence between defense mechanisms and 
precognitive defensive organization. In pre- 
liminary experiments, types of such organiza- 
tion have been identified mainly in those 
cadets who failed in primary and basic train- 
ing. A code for the typification of precognitive 
defensive organization has been worked out, 
together with preliminary norms for a rating 
procedure. For subsequent groups tested, pre- 
dictions of pass and fail have been made, and 
matched against the criterion after the lapse 
of about nine months. 

For the aviation cadets, a very satisfactory 
validity has been found in the case of the 
main rater, a rather satisfactory validity for 


organization superim- 


series of 


also be 


stress. 
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one co-rater, and an unsatisfactory validity 
for three co-raters. As regards the attack 
divers, there is a positive tendency for both 
raters. Low co-rater validity is ascribed to 
unsatisfactory preparation for coding and rat- 
ing, and to the use of an abridged version of 
the code. The interrater reliability in the 
samples varies between .51 and .90. Correla- 
tions between the DMT and other “conven- 
tional” tests and psychologists’ ratings for the 


screening of aviation cadets are low; the 


DMT it is therefore likely to increase the 
validity of the total test battery. 
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TRACKING PERFORMANCE RELATED TO DISPLAY-—CONTROL 
CONFIGURATIONS ' 


JAMES J. REGAN? 


Fordham University 


The objectives of this research were to in- 
vestigate the effects that display-control fea- 
tures, control order, and kind of tracking have 
on tracking performance. Specifically, the fol- 
lowing variables were investigated: 

1. Linear controls and displays vs. rotary 
controls and displays 

2. Two-dimensional (coordinate) informa- 
tion and control presented in separate vs. 
combined configurations 

3. Display-control configurations contain- 
ing both linear and circular elements vs. those 
containing either all linear or all circular 

4. The interacting effects on the above of 
position and rate control and pursuit and 
compensatory tracking 

5. The effect of practice 

Earlier investigations have concerned them- 
selves with aspects of the above variables but 
usually with one-dimensional, position con- 
trol. For example, the importance of compat- 
ibility of display and control with respect 
to direction of motion has been established 
(Vince, 1944, 1950; Warrick, 1948). Thus, 
the left movement of a control should result 
in a left movement of the display to con- 
form to population expectancies. Investiga- 
tions (Loveless, undated; Simon, 1954) have 
also been made of the relative strengths of 
two competing expectancies—linear (e.g., up- 
up) and curvilinear (e.g., clockwise-clock- 
wise). In these competing cases, the curvi- 
linear set tends to predominate. This research 
was performed with rotary displays and, with 
one exception, rotary controls. Unanswered is 
the question of the relative merit of linear vs. 
rotary configurations. 


1 This paper is based on the writer’s doctoral dis- 
certation. The author gratefully acknowledges the 
help of J. G. Keegan and J. F. Kubis. A detailed 
technical report of this study has been issued by the 
United States Naval Training Device Center, Port 
Washington, New York. 

2 Now with the United States Naval Training De- 
vice Center. 


In two-dimensional control, the X and Y 
coordinates may be controlled separately, or 
in combination through their vector sum. Al- 
though the combined mode is usually re- 
garded as superior, no clear determination 
has been made of the relative merit of these 
two methods. 

In addition, no single experiment has in- 
vestigated the possible interacting effect of 
control mode (position, rate), kind of track- 
ing (pursuit, compensatory), and practice on 
the above variables. 


METHOD 


Apparatus. The equipment used in this experiment 
was designed to simulate a variety of tracking situa- 
tions. The device consisted of displays, 
machine elements, and a program unit. 

Displays. The following alternate sets of displays 
provided input (stimulus) information: 

1. Two circular displays each presenting one of the 
two dimensions of the input. These displays were 
aligned horizontally. They each contained two mov- 
ing elements—a black 1-in. square (target) on the 
outer circumference and a red %-in. equilateral tri- 
angle (follower) on an inner circumference—so ar- 
ranged that they were tangential to each other when 
aligned. Both elements moved through + 160° from 
the 0 or 12 o'clock position on a 1%-in. radius circle. 

2. Two linear displays each presenting one of the 
two dimensions of the input. These displays were 
84 in. long and 1% in. wide. One was oriented 
vertically, one horizontally. In each of these, there 
were two moving elements, a square and a triangle, 
corresponding to those in the circular displays. The 
display elements moved through a distance equal to 
the 320° arc of the circular displays. 

3. One combined display presenting a vector plot 
of both dimensions of the input. This display also 
contained two moving elements: a 1-in. white square, 
and a green circle whose diameter was equal to the 
altitude of the triangles. Both elements were rear- 
projected by two miniature projectors and appeared 
as patches of light on a convex, translucent screen. 
They moved through an area bounded by distances 
equal to those in each of the other two display 
configurations. 

Controls. The following alternate sets of controls 
provided for the response of the operator: 

1. Two circular controls (hand cranks) each con- 
trolling one dimension of the output (response). 


controls, 
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These controls were 1! in. in length from the center 
of rotation and had 1-in. hand grips offset 90° from 
the control arm. They moved through an arc of 
* 160° from the 12 o’clock position. They were 
spring loaded and returned to the 0 position (12 
o'clock) when released. 

2. Two simulated linear controls (levers) each 
controlling one dimension of the output. These levers 
were 8) in. in length from the pivot point. The end 
of each control moved through an arc of +27 
One lever moved in the horizontal axis, the other in 
the vertical axis. The levers were spring loaded and 
zeroed normal to the plane at the pivot point 

3. One “joystick” free to move in both coordinates 
and thus capable of simultaneously controlling both 
dimensions of the output. This control was 8% in. 
in length. It moved omnidirectionally in an arc 27 
from the normal and was also spring loaded. 

Machine Elements. The ranges of translation of 
the display elements (squares, triangles, and circle) 
and controls (handwheels, levers, and joystick) were 
all equivalent. A 1-in. displacement of a control pro- 
duced a follower change of 1 in. with position con 
trol and 1 in. per sec. with rate control. The displays 
were located on the face of a rectangular box (5' 9” 
x 4’ 4” X 2’ 8”) which enclosed the program unit 
and the electromechanical elements of the device. The 
displays were 46 in. from the floor and were about 
the eye level of a seated S. The three control con- 
figurations were mounted on separate tables which 
could be placed before any one of the three display 
configurations. The controls were connected to 
precision potentiometers which transmitted voltage 
changes proportional to control movement to drive 
the follower elements (e.g., triangles) of the displays, 
and to provide scoring information. These voltage 
changes could also activate electrical servosystems 
which changed control from position to rate. Thus, 
two control orders, position and rate, were available. 
The program unit could, through a clutch mechanism, 
activate a target (e.g., the black squares) independent 
of subject control, or, operating through a differential 
gear common to it and the control output, enable 
the S to nullify the effect of programed movement. 
In this way, both pursuit (independent movement of 


course 


a 


input 


the target and follower) and compensatory (follower 
movement the vector sum of the input and control 
movement) tracking were available. 

Program Unit. The program unit consisted of a 
single pattern etched on each of two circular discs 
one for each dimension of the stimulus. The discs 
were oriented such that the origin of the X dimen- 
sion became the 90° point of the Y dimension. That 
is, the patterns were displaced %4 turn. The discs 
rotated at a constant speed of 14 rpm. Thus, it re- 
quired 3 min. for the complete pattern to be traced 
The etched patterns were sensed by metal styli. This 
movement was transmitted through a series of tape 
and cable drives to the displays. In the case of the 
circular and linear displays, the X and Y dimensions 
were fed separately to each. The vector plot of both 
coordinates was fed through a gimbal system to the 
projectors of the combined display. 

Scoring. Scoring was accomplished by means of a 
Standard timer which was activated when the target 
and follower were separated. The tolerance was set 
such that when both followers (triangles) 
tangential to any portion of the 1-in. squares no 
error was recorded. In the case of the combined dis- 
plays, no error was scored when any part of the 
follower (circle) touched any part of the white 
square. Error scores (time off target) were recorded 
in hundredths of a second. 

Design of the 
were: 


were 


Experiment. The variables tested 


Display-Control Configurations (DCC) 
Circular Display—Circular Control (CC) 
Linear Display—Circular Control (LC) 
Circular Display—Linear Control] (CL) 
Linear Display—Linear Control (LL) 
Combined Display—Circular Control (Comb. C) 
Combined Display—Joystick Control (Comb. J) 
Control (C) 
Position 
Rate 
Tracking (T) 
Pursuit 
Compensatory 
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TABLE 1 


OF VARIANCI 


Level of Significance 


Trials 
19-27 


Trials 
10-18 


001 <.001 
nS n 
001 < .001 


DCC X T < <.05 < 05 
DCC XC ns 
ie 4s ‘ <.01 


DCC XTXC 


A factorial design was used yielding 24 conditions 
Ninety- 
They were college students 
ranging in age from 17 to 25 and screened for normal 
vision using the Bausch and Ortho-Rater. 
Students with specialized prior tracking experience 
were not used. Four Ss were tested for each of the 
24 conditions of the experiment. No S was 
more than one condition. 


to be tested in a 6 ?* 2 variance model 
six male Ss were used 


Lomb 


tested for 


The course input (Fig. 1) was designed to provide 
a two-dimensional task of sufficient difficulty (it was 
tracked with about 50% error for the average of all 
conditions) as well as one which did not form a pet 
ceptually meaningful pattern. The total pattern, con- 
taining a variety of rate, amplitude, and directional 
length. The problem was 
thus providing three course seg- 
and III) each of 1-min 


changes, was 3 min. in 
stopped every 120 
ments (I, II, 

The S was seated about 28 in. from the display- 
control configuration with which he was to be tested 
The room was indirectly lighted such that light .at 
the surface of the dual displays measured 1 ft-c. The 
illumination at the surface of the projected display 
was reduced (two-tenths of a ft-c) so that thé target 
and follower would be clearly visible 

Each S was given appropriate 
tested during a single 1-hr. session. This session con- 
sisted of 10 presentations of the 3-min. pattern. Since 
the pattern was divided into three 1-min. intervals, 
each S received 10 presentations of three different 
segments. Each of these presentations was considered 
a trial. The first three trials were practice trials and 
not used in the analysis 
S received 27 1-min 


duration 


instruction and 


For scoring purposes, each 
trials 


‘ RESULTS 


The mean performance scores for each of 
the segments were computed and the differ- 
ences were compared by means of ¢ tests. 
None of the differences exceeded three-tenths 
of a second and none was significant. The 


segments could then be considered equivalent 
and were so treated in the analyses. 

Analysis of variance was the statistical 
technique used in this experiment. Since the 
effect of practice was considered, separate 
analyses were performed for Trials 1-9, 10 
18, 19-27, and ‘total trials (Table 1). The 
scores used were the mean values for the first 
nine trials, second nine, etc. No significant 
differences emerged with practice. In view of 
this, the following will report only the results 
observed from the analysis of total trials (Ta- 
ble 2). The other analyses can be found in the 
original study (Regan, 1957). Table 2 shows 
that differences among the DCCs were signif- 
icant (p < .001). The Comb. J was clearly 
superior (Table 3) to all other configurations 
by a factor of about two. The remaining five 
configurations were closely grouped with CC 
best and CL 
compensatory 


worst. The difference between 
(24.0 error) and 
pursuit (22.7 seconds of error) tracking was 
not significant. control resulted in 
13.9 rate control in 28.6 
seconds of error. This difference was signif- 
icant (p < .001). The DCC xX T interaction 
was significant (p < .01). No clear pattern 
emerged except that CC shifted in rank from 
fifth in pursuit to second in compensatory. C 
interacted significantly with T (p< .001). 
With position control, the mean error score of 


SCC onds of 


Position 
seconds of error, 


12.0 in pursuit tracking was significantly supe- 
rior to the mean error score of 17.2 in compen- 
satory tracking. However, with rate control, 
the mean error score of 31.0 for compensatory 


rABLE 2 


ANALYSIS OF VARIANCE FOR TOTAL TRIALS 


Source f WS ‘ Sig. Level 


464.19 16.37 <.001 


DCC 
1 42.80 1.51 ns 


C 7457.14 262.95 001 


DCC X T 
DCC XC 
Pec 


94.92 
22.62 


356.90 


3.35 
<1.00 
12.58 


<.01 


ee ae ye 38.01 1.34 


Error (within cell) 72 28.36 


Total 95 
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tracking was slightly, though insignificantly, 
better than the mean errot 
tained for pursuit tracking. 

In order to assay the significance of the 
differences among the DCC means after re- 
moving the markedly superior Comb. J con- 
figuration, an additional analysis of variance 
for total trials was performed omitting the 16 
Ss who were tested with the Comb. J con- 
figuration. The DCC main effect was not sig- 
nificant. Thus, the Comb. J was the only 
significantly different configuration. The re- 
mainder of this analysis did not differ from 
the previous analysis (Table 2) on total trials. 

Since position control is the more widely 
used form of control with inputs of the kind 
used in this experiment, a final analysis of 


score of 33.5 ob- 


variance was performed omitting those 48 Ss 
using rate control and those 8 using Comb. J 
and position control. In this analysis, the 
DCC main effect was significant (p < .05). 
An inspection of the means (Table 4) shows 
the compatible configurations (CC and LL) 
having least error and the Comb. C the most. 


DISCUSSION 


The combined-joystick configuration proved 
to be superior (one-half as much error) to the 
other five configurations under all conditions 
of the experiment. Combining the X and Y 
dimensions into one display while controlling 
X and Y with two controls (Comb. C) did 
not improve performance. Therefore, given 
two-coordinate information with two controls, 
combining the stimulus in a single display will 
be no different from presenting it separately 
in two one-dimensional displays. While this 
indicates that the system of the 
operator is more sensitive to design param- 
eters than is the sensory system, additional 
comparisons (e.g., dual displays 


response 


single con- 


TABLE 3 
RATION MEAN 
Porat TRIA 


DisPLAY—CONTROL CONFIG! 


SCORES IN SECONDS FOR 


Cc 
LC 
CL 
LL 
Comb. C 


Comb. J 


rABLE 4 
SECONDS | 


MEAN Error Scort 


( ONTROL CONFIGURATIONS 


m Five Display 
UNDER POSITION 


CONTROI 


trol) would be 
clusions. 

The superiority of Comb. J over all other 
configurations does not conform with Andreas’ 
(1955) result. However, as Andreas points 
out, his joystick had several undesirable fea- 
tures, and his task was a discontinuous one, 
involving periodic positioning with knobs. 

In view of the insignificant DCC main ef- 
fect when Comb. J was removed, two-coor- 
dinate information presented and controlled 
separately can be handled equally well with 
linear or rotary configurations or with config- 
urations combining linear and rotary features. 

The above result is, in a sense, averaged for 
both position and rate control and pursuit and 
compensatory tracking. However, the operator 
in any given situation will be using only one 


necessary to draw firm con- 


mode of control and tracking. Consequently, 
the results of the analysis using position con- 
trol alone are important. Under position con- 
trol, the CC proved to be 
superior to the other configurations. This is 
consistent with the research of Loveless (un- 
dated) and (1954) in which they 
found best those configurations in which the 
display-control directional relationships were 
simultaneously clockwise-clockwise and left- 
left, right-right. In the present experiment, 
CC was followed by LL, LC, CL. This order 
is consistent with the increasing ambiguity of 
the directional relationships, although the dif- 
ferences, other than the superiority of CC, 
have not been’ demonstrated to be significantly 
different. 

An important outcome of this study was the 
significant T x C interaction. Pursuit track- 
ing was significantly superior to compensatory 
under position control. This conforms to most 
of the research in this area and is explained 
on the basis that in pursuit the operator has 


configuration 


Simon 





314 


more information about the stimulus with 
which he deals. Under rate control, however, 
there was no significant difference between 
compensatory and pursuit although compen- 
satory yielded less error. This is related to 
earlier findings (Chernikoff, Birmingham, & 
Taylor, 1955) in which aided pursuit was 
inferior to unaided pursuit. In the present 
study, the attempt to predict rate changes of 
an input containing many directional changes 
may have proved not only inefficient but dis- 
tracting. However, since rate control proved 
to be twice as difficult as position control, the 
interaction may have been related to task 
difficulty. 


SUMMARY 


The relative merit of six different display- 
control configurations was determined in a 
continuous tracking task using both pursuit 
and compensatory tracking, and position and 
rate control. Ninety-six Ss were tested, four 
on each of the 24 conditions of the experiment. 
The following conclusions were reached: 

1. The combined-joystick configuration was 
significantly superior to the other five config- 
urations for all conditions of the experiment. 

2. With both forms of tracking and both 
modes of control, there were no significant dif- 
ferences among the remaining five configura- 
tions. With both forms of tracking but with 
position control alone, there was a significant 
difference among these five configurations 
with the circular-circular configuration being 
superior. 

3. Pursuit tracking was not different from 
compensatory across both modes of control. 
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Under position control, pursuit was signif- 
icantly superior; under rate control, compen- 
satory was superior just short of significance. 

4. Position control was significantly supe- 
rior to rate control for all conditions of the 
experiment. 

5. Practice did not significantly affect the 
relative merit of any of the variables tested. 
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THE EFFECT OF POINTER WIDTH AND MARK WIDTH 
ON THE ACCURACY OF VISUAL INTERPOLATION ' 


A. V. CHURCHILL 


Defence Research Medical Laboratories, Toronto, Canada 


Studies of visual interpolation reported by 
Grether and Williams (1947, 1949) indicate 
that a scale interval of approximately 0.5 
inch, presented at a 30-inch viewing distance, 
is the optimal length for interpolation in 
tenths. Results obtained by Kappauf and 
Smith (1948, 1950) and Leyzorek (1949) 
tend to support this conclusion. However, two 
recent studies (Churchill, 1956, 1959) report 
an optimal interval length of 1.0—1.5 inches. 
This discrepancy required an explanation. 

The displays used in the {956 study, and 
the first experiment of the 1959 study, were 
examined in an effort to uncover any design 
factor which had not been controlled. It was 
found that while the length of the scale inter- 
val increased from 0.25 inch to 3.0 inches, the 
pointer remained at a constant width of 0.128 
inch, resulting in a variable ratio of pointer 
width to scale unit width (one scale unit being 
equal to one-tenth of the interval length when 
interpolation is in tenths). Thus, on intervals 
0.25, 0.50, 0.75, 1.0, 1.5, 2.0, and 3.0 inches 
in length, the 0.128-inch wide pointer ex- 
tended over approximately 5.12, 2.56, 1.71, 
1.28, 0.85, 0.64, and 0.43 scale units, respec- 
tively. The optimal interval lengths of 1.0 and 
1.5 inches were those interval lengths at 
which the pointer width was nearest to one 
scale unit, i.e., 1.28 and 0.85 of a scale unit. 
In the second experiment of the 1959 study, 
the width of the pointer was a constant pro- 
portion of the scale interval length, i.e., the 
pointer was approximately 0.85 scale unit in 
width for all interval lengths. 

The displays used by Grether and Williams 
had a constant pointer width of 0.094 inch for 
all interval lengths ranging from 0.32 to 1.309 
inches; as a result the pointer extended over 
approximately 31.3 scale units on the shortest 
interval to 0.7 scale unit on the longest inter- 

1 Defense Research Medical Laboratories Report 
No. 164-10, DRML Project No. 164, PCC No. 
D77-94-20-27, H. R. No. 187. 


val. Their results showed equal accuracy when 
interpolations were made on intervals 0.567, 
0.654, and 0.872 inch in length, on which the 
pointer was 1.6, 1.4, and 1.1 scale units wide, 
respectively. 

The displays used by Kappauf and Smith 
gave a range of intervals from 0.073 to 1.76 
inches in length, with a constant pointer width 
of 0.12 inch on 2.8-inch diameter dials and 
0.06 inch on 1.4-inch diameter dials. Thus, 
the pointer width varied from approximately 
8.22 scale units on the shortest interval to 
0.68 scale unit on the longest interval. Great- 
est accuracy was obtained with intervals 1.76 
and 0.88 inch long on the 2.8-inch dial, and 
0.88 and 0.44 inch long on the 1.4-inch dial, 
on which the pointers were 0.68, 1.36, 0.68, 
and 1.36 scale units wide, respectively. Their 
results showed that with a 0.44-inch interval 
on the 2.8-inch diameter dial, where the 
pointer was 2.73 scale units wide, 27.4% of 
the readings were in error, whereas, with a 
0.44-inch interval on the 1.4-inch diameter 
dial, where the pointer was 1.36 scale units 
wide, 16.2% of the readings were in error. 

The above analysis suggested that the ratio 
of pointer width to scale unit width might be 
an important factor in visual interpolation, 
ie., the nearer the pointer width approaches 
one scale unit the greater will be the accuracy 
of interpolation. If this is the case, then the 
ratio of pointer width to scale unit width 
might determine the optimal length of inter- 
val for interpolation in tenths. It was also 
noted that scale mark width, i.e., the width of 
the marks at the extremities of the interval, 
varied with the variations in interval length 
on many of the displays in the above studies. 

A preliminary experiment was conducted to 
examine the effect of the ratio of pointer 
width to scale unit width on interpolation 
accuracy, and on scale interval length. Five 
Ss were presented with three different dis- 
plays, i.e., a 0.5-inch interval with a pointer 
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TABLE 1 


oF VARIANCE FOR VISUAL INTERPOLATION 


UNDER THIRTY-Six EXPERIMENTAL CONDITIONS 


Source 


S (Subjects) 
D (Viewing Distance 
Error 1 


L (Interval Length) 
LXD 
) 


Error 2 


P (Pointer Width 
M (Mark Width 
PxXM 

PX L 

MX L 
PXMXL 
PXD 

MxXD 
PXMxXD 
PXLXD 
MXLXD 
PXMXLXD 


Error 3 
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Potal 


te rhe 
* > <.05 
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approximately 1.0 scale unit wide, a 1.5-inch 
interval with a pointer approximately 1.0 
scale unit wide, and a 1.5-inch interval with 
a pointer approximately 0.33 scale unit wide. 
Results showed that an increase in pointer 
width from 0.33 scale unit to 1.0 scale unit 
on the 1.5-inch interval was accompanied by 
a decrease in errors from 25.6% to 14.4%. 
An increase in interval length from 0.5 to 1.5 
inches, with a 1.0 unit wide pointer in each 
case, was accompanied by little change in 
error, i.e., 15.6% on the 0.5-inch interval and 
14.4% on the 1.5-inch interval. 

The present study was designed to inves- 
tigate the effect of pointer width and scale 
mark width on the accuracy of interpolation 
in tenths on scale intervals of different lengths. 


METHOD 


Apparatus. The apparatus has been described 
Churchill, 1959, Exp. 2). The displays consisted of 
three different interval lengths, 0.5, 1.5, and 3.0 
inches; three pointer widths, 0.25, 1.0, and 4.0 scale 


units; and two scale mark widths, 0.25 and 1.0 scale 
unit. The 18 conditions were presented at both 28 
and 56-in. viewing distances 
2 ft-L for all conditions 
Procedure. The 10 female Ss were paid for their 
participation in the experiment. The 18 experimental 
conditions were presented at the 28- or 56-in. view- 
ing distance to alternate Ss. In a second session, one 
week later, Ss were presented with the 18 conditions 
at the other viewing distance. Within one session the 
three interval lengths were 
order. The six pointer width 
ditions were presented in 
interval length 


Display brightness was 


presented in random 
scale mark width con- 
random order for each 
Eighteen stimuli, i.ec., the pointer at 
unit Positions 1 through 9 twice each, were presented 
in random order for each of the conditions. 
Viewing distance was controlled by a chin-and- 
head-rest. Under each of the conditions, Ss were first 
shown the pertinent display, with the pointer at “0,” 
“10,” for The 18 random 
then each for 0.25 sec., and 
each exposure was fol!owed by a 4-sec. interval for 
A 1-min. rest period followed the pre- 
sentation of each pointer width X mark width condi- 
tion, and a 5-min. rest period followed the presenta- 
tion of each interval length, i.e., the six pointer width 
mark with conditions. The confounding of the 
viewing distances and interval lengths, between days 
and trials, respectively, was designed to assure a 
maximum of precision in measuring the effects of 
pointer width and mark width, and their interactions. 


and then at 


settings 


orientation 


were exposed 


Ss’ response 


RESULTS 


Data were tabulated in percentages of in- 
terpolations in error? and transformed to 


VIEWING DISTANCE 
NCHES) 
2 ee) 
|} 56 Oe — — 6 | 


\@ 
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Fic. 1. The effect of pointer width on accuracy of 
interpolation in relation to interval length, to mark 
width, and to viewing distance. 

*Since consideration of error magnitude does not 
alter the results—less than 2% of the errors were 
greater than one unit—only error frequency data are 
presented here 
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degrees (9 = sin"'\/ p) to satisfy the assump- 
tions of analysis of variance (Quenouille, 
1950). Results of the analysis of the 36 exper- 
imental conditions are presented in Table 1. 

The significance of the pointer width effect, 
shown in Table 1, reflects the greater accur- 
acy of interpolation with a pointer 1.0 scale 
unit wide than with a pointer 0.25 or 4.0 
units wide, under all conditions. The effect 
of pointer width on accuracy of interpolation 
is shown in relation to interval length, to 
mark width, and to viewing distance, in Fig- 
ures 1A, 1B, and 1C, respectively. 

Figure 1A also shows that pointer width 
has some effect on the optimal interval length, 
i.e., interpolation on the 0.5-inch interval with 
the 1.0 unit wide pointer is as accurate as 
interpolation on the 1.5- or 3.0-inch interval 
with the 0.25 or 4.0 units wide pointer. 

The significance of the mark width effect, 
shown in Table 1, reflects the 
curacy of interpolation obtained with a scale 
mark 1.0 unit wide than with a scale mark 
0.25 unit wide, under all conditions. The ef- 
fect of mark width on accuracy of interpola- 
tion is shown in relation to interval length, to 


greater ac- 


pointer width, and to viewing distance, in 
Figures 2A, 2B, and 2C 

The significance of the interval length ef- 
fect is Figures 1A and 2A, i.., 
greater accuracy was obtained with the 1.5- 
and 3.0-inch intervals than with the 0.5-inch 
interval. 

The significant P x M 


respec tively. 


shown in 


D interaction was 
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Fic. 2. The effect of mark width on accuracy ol 
interpolation in relation to interval length, to pointer 


width, and to viewing distance 
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PERCENTAGE OF INTERPOLATIONS IN ERROR 


Fic. 3. The effect of viewing distance on accuracy 
of interpolation in relation to pointer width and to 
mark width 


three interval lengths combined 

the result of anomalous responses to the 4.0- 
unit pointer and 0.25-unit mark condition at 
the two viewing distances, as shown in Fig- 
ure 3. 


DISCUSSION 


The purpose of the present study was to 
investigate the effects of pointer width and 
scale mark width on the accuracy of inter- 
polation with different interval lengths. The 
design of the experiment permitted maximum 
precision in the measurement of the pointer 
width and mark width effects, with less pre- 
cision in the interval 
length and viewing distance effects. This dif- 


measurement of the 


ference in precision of measurement is demon- 
strated by the comparison of the error terms 
in Table 1 with the theoretical residual. 
Under the conditions of this experiment, it 
is seen that a pointer which is 1.0 scale unit 
in width generates less error than a narrower 
or a wider pointer. Also, it is seen that the 
accuracy of interpolation is increased when 
this optimal pointer width is combined with 
scale marks which are 1.0 scale unit in width. 
These results are not in agreement with 
those reported by Kappauf and Smith (1950): 


accuracy of interpolation between marks of a given 
thickness is dependent upon the separation of those 
essentially all other 
this applies to 


marks and is independent of 


factors in the design of the display 
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scales where the marks are about 0.015 inch in stroke 
thickness (p. 15). 


In the present study, mark width, i.e., stroke 
thickness, ranged from 0.0125 to 0.30 inch, 
the pointer width effect being the same for all 
mark widths, i.e., greater accuracy when the 
pointer was 1.0 scale unit in width. It is also 
to be noted that the displays used by Kap- 
pauf and Smith consisted of circular scales, 
while those reported on here consisted of one 
horizontal interval which eliminated 
reading errors as such. 

There was some evidence, far from conclu- 
sive, that the ratio of pointer width to scale 
unit width has an effect on the optimal length 
of interval for interpolation in tenths. Thus, 
as shown in Figure 1A, accuracy on the 0.5- 
inch interval with a 1.0-unit pointer was 
slightly better than on the 1.5- and 3.0-inch 
intervals with a 0.25- or 4.0-unit pointer. 
Further analysis showed that accuracy on the 
0.5-inch interval with the optimal pointer and 
scale mark widths was considerably better 
than on the 1.5- and 3.0-inch intervals with 
the nonoptimal pointer and scale mark widths. 

As shown in Figures 1A and 2A, there was 
no overall difference in accuracy on the 
1.5- and 3.0-inch intervals. However, further 
analysis revealed that accuracy was greater 
on the 1.5-inch interval at the 28-inch view- 
ing distance and on the 3.0-inch interval at 
56 inches. These results suggest that viewing 


scale 


V. Churchill 


distance might have an effect on optimal in- 
terval length, which is contrary to results pre- 
viously reported (Churchill, 1959). 
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Brainstorming as a method of problem solv- 
ing has received a great deal of attention in 
recent years. It usually involves a group at- 
tempt to solve a problem by following the 
four basic rules described by Osborn (1957). 
Osborn emphasizes the value of group inter- 
action in facilitating the flow of ideas. This 
technique may be viewed as another function 
of small group processes. An important study 
of brainstorming was conducted by Taylor, 
Berry, and Block (1957). In their study, 12 
groups of four men each, and 48 individuals, 
after training in brainstorming, were asked to 
solve three problems by that method. Two of 
these problems were essentially: (a) the 
Tourist Problem—concerned with finding 
ways to attract European tourists to visit 
America during their vacations and (6) the 
Thumb Problem—concerning practical ben- 
efits or difficulties which would arise if every- 
one born after 1960 had an extra thumb on 
each hand. These two problems plus two 
others were used in the present study. Taylor 
and his associates studied the effect of group 
vs. individual brainstorming upon mean num- 
ber of ideas, mean number of unique ideas, 
and upon ratings of Feasibility, Effectiveness, 
Generality, Probability, and Significance. 

A major contribution of Taylor has been 
his method of analyzing the results of a de- 
sign where the question is not the comparison 
of a group vs. an individual but instead the 
comparison of actual group performance vs. 
the performance of an equal number of in- 
dividuals working independently but whose 
productions are treated as if they had actually 
worked together. Taylor calls these groups 
“nominal groups.” 

Taylor et al. found that the performance of 
the real groups was markedly inferior to that 
of the nominal groups on all three problems 
on all the above variables. Thus they conclude 


1 Presented at the Annual Convention of the Amer- 
ican Psychological Association, September 1959. 


that group participation when using brain- 
storming inhibits creative thinking. They 
point out the need for further systematic re- 
search in varying the kind of problem, kind 
of groups, and degree of training. 

Hypotheses of present study are: (a) Co- 
hesive brainstorming groups will produce a 
greater number of ideas and more unique 
ideas than either nominal or noncohesive 
groups on neutral problems, since people who 
wish to work together should experience less 
inhibition and criticism than those in other 
groups. (6) These superiorities will be sig- 
nificantly greater when the groups deal with 
ego-involving problems since these problems 
presumably add additional stress. (c) Cohe- 
sive groups will produce qualitatively better 
ideas (using Taylor’s scales of Feasibility, 
Effectiveness, Generality, Probability, and 
Significance) than nominal or noncohesive 
groups, for the same reason. (d) Trained 
groups will perform significantly better on all 
variables than their untrained counterpart 
groups since training will increase the prob- 
ability of the operation of the method. (ce) 
Perception of brainstorming skill in others is 
significantly related to partner choice for 
brainstorming sessions. 


PROCEDURE 


One of the authors conducted courses of 10 hours 
in creative thinking with hospital administrative and 
professional personnel, all of whom volunteered to 
serve as subjects in this study. These trained Ss were 
compared with an equal number of untrained affiliate 
nurses of equally superior intelligence, as measured 
by the Otis Self-Administering Test of Mental Abil- 
ity. The affiliate nurses who volunteered had no such 
detailed training in creative thinking but only a 
single indoctrination session with the experimenters, 
who explained the method and made sure that they 
understood, felt comfortable with the method, and 
would cooperate in the study. 

All Ss sociometrically ranked all other Ss within 
their respective training groups in terms of partner 
preference for brainstorming. They also ranked each 
other (including the ranker) in terms of brainstorm- 
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TABLE 1 


MEANS AND STANDARD DEVIATIONS FOR NUMBER OF RESPONSES 
AND NUMBER OF UNIQUE RESPONSES 


Nominal 
l'rained 


lourist Problen 
Mean Responses 
SD 
Mean Unique Response 
SD 


Thumb Problem 
Mean Responses 
SD 
Mean Unique Responses 
SD 
Discharge Problem 
Mean Responses 
SD 
Mean Uniaqi 
SD 


le Respo St 


and untrained 
“nominal’ 


formed 


both trained 


noncohesive, and 


ing skill. In samples, 
subgroups of 


The cohesive pairs 


cohesive, 
two persons each 
were formed, 
groups, by 


were 
within the trained and 
pairing together those Ss 
rating, 
partners 
The 
formed by 


same 


untrained 
who, on the 
sociometric other as 
storming 


group. 


brain- 
total 
analogously 
who, on the 
preferred each other 
Thus there 
working under 
trained and 24 working under untrained conditions 
Within these there 


subgroups: cohesive, noncohesive, and 


preferred each 
within the top six of the 
noncohesive 


pairing togethe 


pairs were 

those Ss 
sociometric rating, least 
within the bottom six of the total group 


were 48 Ss in all, with 24 of 


them 
each of three 
nominal. The 
latter were 8 Ss in the trained sample and 8 in the 
untrained sample worked independently and 
singly but whose produ tions were paired by chance 
within their training groups and treated as if the Ss 
had actually worked together 

The kind of problem was varied by 
Taylor’s problems 


conditions, were 


who 


using two of 
Tourist and Thumb) which were 
non-ego-involving, and two especially devised prob- 
lems pertaining to hospital profe 
trative 


sionai and adminis- 
issues. These were determined to be more 
ego-involving than the two Taylor problems.” Briefly 
these two new problems art a) the Discharge 

2A group of 30 comparable Ss were 
pare each of the 
terms « 


asked to com 
four problems with all others in 
yf the importance to them of performing well 
Each S made six paired comparisons for this impor 
factor. Chi that the 


ind Thumb Problems were viewed a 


tance Tourist 
equally im 
than the 
latter two 


square indicates 


portant but significantly important 


Discharge and Treatment Problems. These 


were viewed as equally important 


Untrained 


Cohesive Noncohesive 


. 
l'rained Untrainec lrained Untrained 


Problem—concerning the practical benefits and dif- 
ficulties which would ensue were all patients of our 
psychiatric hospital to be able to return to their 
families and (b) the Treatment Problem—to devise 
new toys, games, or gadgets which would be helpful 
in the treatment of the psychiatric patient. As in 
Taylor’s study, 1 
problem. No S$ k 


minutes were allowed for each 


1ew the reason for the study or 


paired groupings. In tallying responses for each pair, 
all duplicate respon 


es were thrown out. Thus if any 
nominal pair presented an idea, the 
regarded as having presented it. If 
members of the pair presented the same idea 
or the real pair repeated an idea, it 


only one idea. 


member of the 
pair itself was 
two 
was treated as 
Taylor’s five scales 
was determined in two steps by four staff psychol- 
ogists trained in the use of his 
phase, these judges 
sponse of 


The interjudge reliability of 
scales. In the first 
rated the first re- 
on the scales appropriate 
If the first response of a pair had 
been given by another pair and previously 
second response of that pair 
them for rating. In the 
retrained and 


independently 
each pair of Ss 
to each problem 
already 
rated by the judges, the 
was presented to 
phase, the 


second 


judges rated every 


fifth response of 


were 
each palr. 


RESULTS 


The Tourist, Thumb, and Discharge Prob- 
lems gave a great variety of responses and 
were analyzed. However, the Treatment Prob- 
lem was so difficult and interpreted so differ 
ently by the groups that it was considered not 
comparable with the other three problems. 
The entire problem was eliminated. 





Group Cohesiveness'and Creative Thinking 


1. An analysis of variance was done for 
each of the three problems in terms of the 
mean number of ideas and mean number of 
unique ideas. Table 1 shows the means and 
standard deviations for number of responses 
and number of unique responses for all prob- 
lems. Table 2 shows the appropriate analyses 
of variance for these variables. On the two 
neutral problems (Tourist and Thumb), co- 
hesiveness, training, and their interaction were 
not significant for either ideas 
produced or uniqueness of these ideas. Hypo- 
thesis a is not upheld. 


number of 


2. The analyses of variance for the Dis- 
Problem indicate that, as with the 
other two problems, there is no difference on 
these variables between the groups in terms 
of number of ideas. However, there are sig- 
nificant differences in training and cohesive- 
ness when number of unique ideas is meas- 
ured. The ¢ values shown in Table 3 note 
where these significant differences occur. This 
table shows that in the production of unique 
ideas to ego-involving problems, the: cohesive- 
trained groups did significantly better than all 
other groups whether trained or untrained 
With untrained Ss the groups did 
significantly better than the nominal groups. 
The noncohesive trained groups did not pro- 
duce significantly more unique ideas than 
their untrained counterparts. However, train- 
ing was significantly related to more unique 
groups. 


charge 


cohesive 


ideas in the cohesive and nominal 
Hypotheses 6 and d were supported only in 
regard to number of unique ideas produced. 

3. In phase one of the reliability testing of 
the five qualitative scales of Taylor, the 72 
interjudge correlations ranged between —.48 
and .90 with four falling above .70; 22 falling 


rABLI 


Noncohe SIVC 
rained 


Cohesive 
rrained 
Cohesive Trained 3.10* 
Noncohesive Trained 
Nominal Trained 
Cohesive Untrained 
Noncohesive Untraine 
Nominal | 


ntrained 


* Significant 02 level 
** Significant at .01 level. 


TABLE 2 
ANALYSES OF VARIANCE FOR NUMBER OF RESPONSES 
AND NUMBER OF UNIQUE RESPONSES 


Number of 
Unique 
Responses 


Number of 


Responses 
Source M, WS I 


Pourist Problem 

12.58 
42.19 
29.62 
30.32 


Between training 
Between groups 
Interaction 
Within groups 


Thumb Problem 

20.16 
50.04 
4.54 
36.11 


70.04 
1.29 
7.04 

21.85 


Between training 
Between groups 1.39 
Interaction 


Within groups 


Discharge Problem 

4.16 
Between groups 5.29 
10.04 
72.11 


192.66 27.76* 
135.04 19.46* 
8.79 1.27 

6.94 


Between training 


Interaction 
Within groups 


* Significant at .001 level 


between .40 and .69; 29 falling between .10 
and .39; and the remaining 17 falling between 
.09 and —.48. There was no consistency in 
the correlations between any two judges on 
the different scales. 

Taylor * reports correlations ranging from 
47 to .83 based on two different raters for 
the Tourist and Thumb Problems. He sug- 
gested that the reliability coefficients of the 
present study would have been higher had 
there been less restriction of the range of 
variation in the sample of responses. The 


8 Taylor, D. W. Personal communication, 1959. 


THE DISCHARGE PROBLEM 


Noncohesive 


Untrained 


Nominal 


Untrained 


Cohesive 
Untrained 


3 2R** j 39** 
0.00 1.88 
95 1.36 
2.10 


6.44** 
4.79** 
4.87** 
5.78** 


> 73 
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second reliability check of this study took this 
criticism into account by having the four 
judges rate every fifth response of all pairs, 
after additional training. Even so, no reliabil- 
ity coefficient between any pair of judges 
exceeded .68, and most were considerably 
lower. 

It would have been possible at this point, 
of course, to use the Taylor scales to provide 
a measure of mean differences in quality of 
ideas among the experimental groups. How- 
ever, because we were also seeking scaling 
devices of high enough reliability to test more 
refined hypotheses, it was felt advisable not 
to explore this issue until further reliability 
work was done with these scales. Hypothesis 
c not tested. 

4. Correlations between ratings of percep- 
tion of brainstorming skill in others and 
brainstorming partner choice were worked out 
for each S and reveal a significant relationship 
in all but 9 of the 48 Ss (81%). Thus, socio- 
metic choices for brainstorming partners were 
significantly related to rankings in terms of 
perception of skill. Hypothesis e upheld. 
There were no significant differences among 
the groups in the means of the skill perception 
ratings of their Ss. 

There is a sharp difference of opinion be- 
tween Osborn and Taylor in regard to the 
effect of size of group upon inhibition and 
production of ideas. This is part of the 
broader need for further systematic study of 
the parameters of brainstorming processes in 
small groups as related to such problem vari- 
ables as level of difficulty, and time allot- 
ments; such subject variables as level of 
reward and risk, ego-involvement, amount of 
knowledge in the problem area, training, and 
intelligence; such group variables as size, 
cohesiveness, and skill perception; and such 
rater variables as areas of competence and 
reliability of ratings. 


DISCUSSION 


Brainstorming by pairs of superior adults 
will producé more unique ideas when the 
groups are trained in the method and com- 
posed of people who like to brainstorm to 
gether. This is true, however, only when they 
are working on ego-involving problems. The 
trained cohesive and trained nominal groups 
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produced significantly more unique ideas than 
did their untrained counterparts. This train- 
ing differential did not apply to the noncohe- 
sive groups. If groups are to be formed to 
work together on problems requiring creativ- 
ity, consideration should be given to the de- 
sire of the Ss to work together. The nominal 
untrained groups produced significantly less 
unique ideas than all other groups except the 
noncohesive untrained groups. This finding 
indicates that individuals untrained in the 
method and working independently do signif- 
icantly poorer in the variables studied than 
even groups of noncohesive but trained Ss. 

Thus, it would appear that even for un- 
trained Ss, one should attempt to establish 
groups made up of individuals who express a 
desire to work together and should avoid 
having such Ss work independently. This ap- 
plies only to ego-involving problems and 
where the partner choice is related to percep- 
tion of skill. 


SUMMARY 


Taylor found that group participation when 
using brainstorming methods inhibits creative 
thinking. In this study, variables of training 
and ego-involvement were studied in cohesive, 
noncohesive, and “nominal” groups of two 
members each, based on sociometric choice. 
Results: (a) Only on the ego-involving prob- 
lem were there significant differences among 
the groups and then only in number of unique 
ideas produced. The cohesive-trained groups 
were significantly better than all other groups. 
Even with untrained Ss, the cohesive groups 
did significantly better than the nominal 
groups. There was no significant difference 
between the trained and untrained noncohe- 
sive groups. (b) Sociometric choices for brain- 
storming partners were significantly related to 
the subjects’ perceptions of skill. Tentative 
suggestions were made on the basis of these 
findings to guide formation of creative think- 
ing groups. 
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USE OF THE KUDER PREFERENCE RECORD, 
PERSONAL, WITH POLICE OFFICERS 


DAVID M. STERNE 


Veterans 


Difficulties in the selection of effective 
police officers and an interest in accumulating 
criterion group data for use in vocational 
counseling culminated in the administration 
of the Kuder Preference Record, Personal, to 
49 officers comprising the bulk of a small 
municipal force. This instrument is designed 
to provide five scales designated as follows: 
A, preference for being active in groups; B, 
preference for working in familiar and stable 
situations; C, preference for dealing with 
ideas; D, preference for avoiding conflict; 
and E, preference for directing others. In 
previous study (Kuder, 1953) men in protec- 
tive service occupations (such as policemen 
and firemen) who liked their work were found 
to exhibit low scores in Scale D and high 
E. Similar findings were an- 
ticipated with our subjects, and in addition it 
was hypothesized that D scores would corre- 
late negatively and E scores positively with 
criterion ratings of general efficiency and 
value the After the study was 
initiated, additional ratings were obtained of 
the behavior which each of the five scales is 
purported to measure, as defined in the record 
manual. 


scores in Scale 


to force. 


METHOD 


) 


from 27 to 66 (M = 40.4), 
to 16 (M = 11.4), and 
to 28 yr. (M=98 


based on the mean 


The Ss ranged in age 
in years of education from 7 
in length of service mo 
yr.). The criterion ratings were 
sigma score obtained by each officer when rated in 
dependently by seven supervisors, each well 
quainted with the ratee’s behavior on duty. Six of 
these supervisors provided the ratings of behavior 
defined by the Kuder scales. All of the ratings were 
done under a nine-class forced distribution system, 
and their mean intraclass correlations (Ebel, 1951) 
were: Criterion 89, A .71, B 59, C D .90, and 
E 84. 

Mean score on each 
with those of the 
and those obtained from 
service occupations (Kuder, 1953) 
calculated between scale and 
which the scales represent, and 


from 2 


ac- 


69, 


Kuder scale 
normative base 


were compared 
group for the 
men in protect 


Correlations were 


in 


ventory ive 


scores ratings of be 


havior between the 
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criterion rating, relevant test data, and 


descriptive of the sample. 


measures 


RESULTS 


Mean Kuder scores, presented in Table 1, 
differed significantly and in the predicted di- 
rections on Scales D and E from those found 
by Kuder in his normative base groups. Cor- 
relations of .31 (< .05) and .47 (< .01) were 
found between Scales C and E and ratings of 
the corresponding preferences for working 
with ideas and directing others. No significant 
correlations were found between the criterion 
rating and any of the Kuder scale scores, age, 
or years in service. Education correlated .28 
with the criterion measure and .30 with scores 
in Scale (< .05), and 43 (< .01) with 


ratings of the corresponding behavior. 
Names of the five lowest scoring officers on 
each of the Kuder scales were grouped to- 


gether by scale and a similar procedure was 
carried out with high scorers. The 10 result- 
ing groupings plus two others of equal size 
were presented in random order, independ- 
ently, to two raters (those who appeared to be 
the most conscientious and discerning of the 
who were then asked to describe any 
identifying characteristics which they could 
associate with any of the groups. Through 
this procedure it was possible to discern some 
additional evidence in three of the scales for 
characteristics which they are purported to 


seven ) 


TABLE 1 
KupER PREFERENCE RECOR!I 
(Form A) MEAN RAw 


», PERSONAI 
SCORES 


Sample 


Kuder base group 


Protective service 


occupation woo $4.47 


Police officers 40.02® 
®* Polic 


level 
b Police 


e officers 


officers 
level. 
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measure. In Scale A, preference for being 
active in groups, also explained as “enjoying 
the opportunity to take the lead or be the 
center of attention” (Kuder, 1953), high 
scorers were described as “conceited, capable 
of being rude, and lacking in social grace,” or 
“giving one impression—of their own impor- 
tance.” Low scorers, on the other hand, were 
considered “polite, friendly, and among the 
best-natured and best-liked” men on the force, 
or, “neither extremely talkative nor preten- 
tious.” In Scale D, preference for avoiding 
conflict, high scorers were difficult to describe 
in modal terms. However, one rater saw four 
of the five low scorers as ‘‘complainers who 
don’t seem to care whether or not they step on 
others’ toes or hurt their feelings,” reacting to 
the public in the same way. The second rater 
described all five low-scorers as having ‘“‘edgi- 
ness in their disposition,” apt to take offense 
easily, and possibly capable of intense hatred. 
In Scale E, 
high scorers were characterized as clean and 


preference for directing others, 


neat, showing considerable initiative, and ca- 
pable but without “throwing weight around.” 
Low scorers were seen as polite and friendly 
to everybody and respectful to superiors. One 
rater further described them as “available for 
any detail, willing, and therefore very useful.” 


SUMMARY 


The Kuder Preference Record, Personal, 
was administered to 49 police officers. Com- 
parison with Kuder norm group data yielded 
highly significant differences in preferences 
for avoiding conflict and for directing others. 
Low but significant correlations were found 


between scores for preference for directing 


others and for working with ideas, and ratings 
of behavior demonstrating these character- 
istics. No correlations were demonstrated with 
a criterion rating of job efficiency and value. 
Education with the 
criterion measure and with scores and ratings 
of preference for working with ideas. Modal 
characteristics of high and scorers in 
three of the Kuder scales were identified and 


was shown to correlate 


low 


described. 
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IMPROVING CREDIT EVALUATION WITH 
WEIGHTED APPLICATION BLANK’ 


JAMES J. McGRATH 


Human Factors Research 


The weighted application blank is a fa- 
miliar tool to most personnel managers and 
has proved to be a valid predictor of per- 
formance on many different types of jobs. 
The work of a personnel manager is similar 
to that of a credit manager in that both often 
use biographical information from application 
blanks to predict the future behavior of in- 
dividuals. Since the prediction problems are 
similar, it is likely that the techniques of per- 
sonnel selection would also be useful in eval- 
uating applicants for credit. 


PROBLEM 


In the retail automobile business credit 


must 


often be extended to customers whose 
failure to fulfill their contract obligations rep- 
resents potential losses to the dealer. When a 
sale ends in repossession of the automobile 


the dealer not only suffers a direct cash loss, 
but also the losses associated with increased 
clerical and handling costs and damaged pub- 
lic relations. If the professional credit man 
can accurately predict the risk involved in 
extending credit to certain individuals, the 
profit on credit sales can be materially in- 
creased. In this study an attempt was made 
to develop a credit evaluation tool, using the 
weighted application blank technique, to as- 
sist in identifying unprofitable groups of 
credit customers. The objective of the study 
was to distinguish between good and poor 
credit risks by analyzing the items of in- 
formation available at the time of application 
for credit. 


PROCEDURI 


The procedure was similar to that used in the item 
analysis of application blanks for personnel selection 
Criterion groups of credit risks for new car sales 
were established by randomly selecting from the files 
of a large automobile dealership 100 records of credit 
customers who paid for their cars as agreed and 100 


1 This study was supported in part by Human 


Factors Research, Incorvorated 


, Incorporated, Los Angele 


records of credit customers who did not pay for their 
and whose 
were considered representative of the desirable and 
the undesirable types of credit customers. In 
case the credit had been made from 12 to 18 
months prior to the date of the study. 


cars Cars were reposse sed These cases 
each 


sale 


Since only data available at the time of the sale 
may be useful in prediction, the information on only 
two documents in each record was analyzed. These 
documents were the contract and the credit applica- 
tion blank. The contract contained 17 possible pre 
dictor items, such as percent down payment, number 
of installments, and trade-in allowances. The applica 
tion blank contained 45 possible predictors such as 
age, occupation, and income. These 62 items 
subdivided into response categories, and the responses 
of the two criterion groups were tabulated. The orig- 
inal categories were then reorganized to simplify the 
items and to maximize the discriminatory power of 
the items where it was possible to do so. For each 
response category of every item the percentage dif- 
ference between groups was tested for statistical sig- 
nificance. Twenty-four items significantly discrim- 
inated the good and poor credit customers 


were 


A new selection of criterion groups was made con- 
sisting of 100 paying customers and 69 customers 
whose cars were repossessed. The 24 selected items 
were scored on the application blanks and contracts 
of these cross-validation groups. Separate scores were 
obtained from the contract items and the application 
blank items. These were then summed to yield a total 
score. Three different scoring systems were used: 

Scoring System A: Each response category was 
weighted according to its degree of significance in 
discriminating the standardization sample. These 
weights ranged from 0 to 14 


rABLE 1 


MEAN SCORES ON THE 24 SELECTED ITEMS 
aND VALIDITY COEFFICIENTS FOR THI 


Cross-VALIDATION SAMPLE 


us MS 
Repossession Paying 
Group Group 


Trois between 
Score and 


Scoring Method Criterion 


Scoring System A 76.6 102.0 


Scoring System B 18.2 23.8 


12.5 7.2 


Scoring System C 


B" score on contract 5.3 7.5 
B" score on applica- 
tion blank 13.0 15.9 
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Elimination of Repos 


By Various Cut-Off Scores U 


B" score 
| 


Repossessed custon 


total scale eliminated 


1%) Cum, 


nannkeae 


Scoring System B: Each response category received 
unit weight (— 1, 0, or 1) with the algebraic sign 
indicating the direction of discrimination. A constant 
of +1 was added to these weights to eliminate the 
minus sign, yielding weights of 0, 1, or 2. 

Scoring System C: Only those items which dis 
criminated the standardization groups at the .01 level 
of confidence were scored, using the unit weights of 
Scoring System B 


RESULTS 
With any of the three scoring systems, the 
mean score of the paying customers was sig- 


sessed and Paying Customers 


ing Unit Item Weights 


T - . ) 


Paying customers iEvaluation 


eliminated Decision 


nificantly higher than the mean score of the 
repossession customers. The coefficients of 
correlation (biserial 7’s) between score on the 
selected items and the criterion ‘were all sig- 
nificantly greater than zero well beyond the 
.O1 level of confidence. These results are pre- 
sented in Table 1. There was no significant 
difference between the correlation coefficients 
obtained by the different scoring systems. 
System B was not only the simplest scoring 
method, but yielded the highest validity. 
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Neither the contract nor the application blank 
alone was as valid as the total score. The cor- 
relation between contract score and applica- 
tion blank score was .28, indicating that these 
two measures were largely independent of 
each other. 

A table was then developed indicating the 
percentage of repossession and paying custo- 
mers eliminated by each possible cut-off score, 
using Scoring System B. Table 2 shows that 
if all applicants receiving a score of 13 or less 
had been eliminated at the time of applica- 
tion for credit, 19% of sales ending in re- 
possession would have been eliminated while 
sacrificing 1% of paid sales. It may also be 
noted that none of the customers who scored 
28 or higher had his car repossessed. This 
rather select group of superior credit custom- 
ers made a large proportion of the total pur- 
chases during the time period sampled. 


DISCUSSION 


The 24 selected items were demonstrated to 
have considerable validity for discriminating 
good and poor credit customers in a cross- 
validation sample. The estimate of validity 
was probably conservative. Every case in the 
sample had already been selected by the exist- 
ing credit evaluation method as an acceptable 
credit risk. With an unselected sample, the 
validity of the selection device would likely 
be higher. These results were comparable to 
those obtained from similar studies (Myers & 
Cordner, 1957; Wolbers, 1949). 

The cut-off scores indicated in Table 2 
should be regarded as only suggestive, and 
should themselves be subjected to cross-vali- 
dation. Then they may be used in the follow- 
ing way: a clerk can score the selected items 
for each credit applicant, rejecting all appli- 
cants who score below the lower cut-off, ac- 
cepting all who score above the upper cut-off, 
and sending those who score in between to the 
credit manager for further evaluation. This 
procedure would not only improve upon the 
validity of the existing credit evaluation 
method, but would reduce the credit man- 
ager’s work load and credit department oper- 
ating costs. 

An estimate of the net profit or loss result- 
ing from the use of various cut-off scores can 
be determined provided certain information is 


available. This estimate may be derived from 
this simple formula: 


S=QM,E,-—PM,E,—-A 


Where: S= estimated savings per credit sale using 
the weighted application blank 
P= proportion of paying customers in the 
defined population of credit customers 
= proportion on nonpaying customers in 
the defined population of credit cus- 
tomers 
= mean profit on credit sales to paying 
customers 
= mean loss on credit sales to nonpaying 
customers 
E,= proportion of paying customers elim- 
inated by the cut-off score 
E,= proportion of nonpaying 
eliminated by the cut-off score 
A=mean cost of administering and scoring 
the weighted application blank 


customers 


As an example of the use of this formula, 
we may apply it to the present study in this 
way: Of total credit sales during the period 
covered by the study, 54 were paid as agreed. 
Therefore, P is estimated to be .83 and Q is 
.17. The mean profit on paid sales (M,) was 
estimated to be $350.00. The mean loss on 
repossessions (M,) was estimated to be 
$200.00. A cut-off score of 13 eliminated 1% 
of paid sales and 19% of repossessions, so E, 
equals .01 and E, equals .19. The mean cost 
of administration and scoring (A) was not 
more than $0.50. Putting these values * into 
the formula, the net profit using a cut-off 
score of 13 would be $3.05 per credit sale. 
With approximately 2,000 sales per year, the 
annual profit would be $6,000. 

A separate analysis of the results of this 
study was made in which the contract scores 
and the application blank scores were used as 
successive hurdles. That is, the applicant had 
to pass a cut-off score on both the contract 
and the application blank before being ac- 
cepted. With this method, 13% of reposses- 
sions were eliminated without losing any paid 
contracts. Net increase in profit using this 
method was $4.34 per credit sale (changing 
only the values EZ, and E, in the equation). 
It will be noted that when only nonpaying 
customers are eliminated and no paying cus- 


* With the exception of E, and E, hypothetical 
values have been used in place of the true values be- 
cause of the confidential nature of such information 
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tomers are lost by using a particular cut-off 
score, the second term in the equation reduces 


to zero. 

Using this dollar criterion the most profit- 
able cut-off score may be determined. By 
transposing the equation, one may also esti- 
mate the degree of discrimination demanded 
by any instrument for it to be a profitable 
predictor. For example, if the incidence of 
nonpayment (Q) is low and the loss sustained 
from nonpaying customers (M,) is low com- 
pared with the profit on paying customers 
(M,), it is likely that a study of this sort 
would not be profitable in terms of the level 
of validity one may expect from such meas- 
ures. This would allow an investigator to esti- 
mate beforehand his chances of developing a 
usable predictor. 


James J. McGrath 


SUMMARY 


A method of personnel selection was ap 
plied to the evaluation of credit applicants. 
A weighted credit application blank was de- 
veloped which significantly discriminated the 
good and poor credit customers in a cross- 
validation sample. A formula was presented 
which allows the profit obtaining from the use 
of any particular cut-off score to be estimated. 
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VOLUNTEERING FOR 


EXTRA-HAZARDOUS DUTY’ 


JOHN T. BAIR anp THOMAS J. GALLAGHER 2 


United States Naval School of Aviation Medicine, Pensacola, Florida 


Of the many implications of the nuclear 
age, one of the most obvious and least pleas- 
ant is its danger to man. As time goes on, and 
man is exposed to larger radiation doses, just 
learning to survive at all may become a haz- 
ardous occupation. For men in the military 
services, there will be extra hazards, those 
that have always gone hand-in-hand with 
military activities. In the future, however, 
these dangers undoubtedly will be intensified. 
In the past, assignment to these activities has 
been made on a voluntary basis. It is impor- 
tant, therefore, that we should give some 
theught to the type of men who volunteer 
for them. 

Several years ago one of the research psy- 
chologists at the United States Naval School 
of Aviation Medicine asked for volunteers for 
an experiment involving exposure to hypoxia 
conditions. One of the volunteers became so 
emotionally disturbed that he was withdrawn 
from the experiment and referred to the hos- 
pital for psychiatric treatment. Was this a 
typical situation? Are volunteers for hazard- 
ous duty assignments likely to be the kind 
who go to pieces in an emergency? : 


Although it is common practice in the mil- 
itary to select men for extra-hazardous duty 
from groups of volunteers, little research has 
been conducted in this area. Furthermore, the 
results of the few studies that have been made 
are controversial. It has been suggested by 
some investigators that volunteers are likely 


to be severely maladjusted. For example, 
Lasagna and Von Felsinger (1954) found 
that volunteer subjects (Ss) for pharmacolog- 
ical experiments showed an incidence of se- 
rious psychological difficulties that was twice 
as high as would be expected in an unselected 


1The opinions or assertions contained herein are 
the private ones of the authors and are not to be 
construed as official or reflecting the views of the 
Navy Department or the naval service at large. 

2Now at the Air Crew Equipment Laboratory, 
Naval Air Material Center, Philadelphia 12, Penn- 
sylvania. 
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college population. Riggs and Kaess (1955) 
found that their volunteers for a psycholog- 
ical experiment were significantly higher on 
introversive thinking and were more moody 
emotionally than nonvolunteers. On the other 
hand, Maslow and Sakoda (1952) found that 
their volunteers consistently have a higher 
mean self-esteem score than nonvolunteers, 
and that the difference between volunteers 
and nonvolunteers on a_ security-insecurity 
score was not significant. Thus they con- 
cluded that this “excludes the possibility that 
volunteers were more neurotic than nonvolun- 
teers.” In an analysis of combat fighters and 
nonfighters by the Human Resources Re- 
search Office (Egbert, Meeland, Cline, Forgy, 
Spickler, & Brown, 1957), it was found that 
the fighter tends to be more intelligent, more 
socially mature, more emotionally stable, and 
to volunteer for extra-hazardous duty assign- 
ments more frequently than the nonfighter. 
Rosenbaum and Blake (1955) found more 
men volunteer for a research project when 
they observe a project assistant volunteer 
than when the assistant declines to volunteer. 
Finally, Schachter and Hall (1952) reported 
students volunteer more readily for psycho- 
logical experiments when the restraints against 
volunteering were low than when restraints 
were high. 

In an attempt to learn more about the char- 
acteristics of the men who “step forward” 
when the call for volunteers is given, the 
authors undertook an investigation involving 
1,154 naval aviation cadets enrolled in the 
United States Naval School of Pre-Flight, 
Pensacola, Florida. Specifically, the purpose 
of the study was to relate willingness to volun- 
teer for dangerous tasks with other variables 
and thus, if possible, to characterize the vol- 
unteer and nonvolunteer more precisely. 


The requests for volunteers for the present project 
were varied in a counterbalance design under three 
different experimental conditions as shown in Fig. 1 
The 1,154 NavCad Ss for this research were distrib- 
uted among these conditions in the following manner: 
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Fic. 1. Experimental design. 


1. Five hundred and ninety-two of the Ss were 
isked to volunteer for an experiment involving ex- 
posure to extreme cold temperatures, and 562 Ss were 
requested to volunteer for an experiment involving 
exposure to cosmic radiation. In actuality these proj- 
ects were fictitious but this was not known to them. 

2. Seven hundred and twenty-one Ss were asked 
to volunteer during the first week and 433 during the 
fifteenth week of preflight school 

Finally, 630 Ss were requested to sign their name 
on a roster publicly in front of the group and 524 
were asked to sign their names on small slips of 
paper and pass them concealed to the investigators 

In all, 489 NavCads volunteered under these ex- 
perimental conditions and 665 did not volunteer. 
Psychological test results, preflight grades, age, ed- 


ag 


ucation, and attrition data were available for both 


the volunteers and nonvolunteer 


Let us first look at the results for the dif- 
ferent experimental conditions. Significantly 
more Ss volunteered for the cold-exposure ex- 
periment than the radiation-exposure experi- 
ment, and significantly more Ss volunteered 
during the fifteenth rather than the first week 
of training. There also were significant differ- 
ences in the number of volunteers who signed 
their names publicly during the fifteenth week 
as contrasted with the first week. 

The scores of the Minnesota Multiphasic 
Personality Inventory were available for 192 
volunteers and 291 nonvolunteers. There was 
a significant difference in the scores between 
the two groups on only one scale. This was 
for the My, scale on which the volunteers 
scored higher than the nonvolunteers. Neither 
group deviated from the college student norms 
on this scale, however. There were no signifi- 
cant differences between the two groups on 
the Aviation Qualification Test (general in- 
telligence), Mechanical Comprehension Test, 
and Flight Aptitude Rating scores, but the 
volunteers did have higher scores on all these 


tests. On the whole, the volunteers were 
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younger and had less education than the non- 
voluntee! 

The volunteers also had significantly higher 
officer-like-quality grades than nonvolunteers. 
Furthermore, there were considerably more 
preflight student class officers among the vol- 
unteers than the nonvolunteers. Since there 
were no class officers selected during the first 
week of training, and there were more openly 
known volunteers for the fifteenth than the 
first week, it is likely that a subtle social pres- 
sure was operating here. That is, the more 
class leaders volunteering led others to volun- 
teer. This result is with that of 
Rosenbaum and Blake (1955), where a proj- 
ect assistant was observed to volunieer and 
thus influenced others to volunteer. 

The greatest differences between the 
groups occurred in the area of attrition from 
the flight training program. These were as 
follows (Table 1): 


consistent 


two 


1. Volunteers had significantly less overall 
attrition than the nonvolunteers. 

2. Nearly twice as many nonvolunteers 
dropped at their own request from the flight 
training program as did volunteers. 

3. Nonvolunteers had more attrition for 
disciplinary reasons than volunteers. 

4. There were more deaths due to training 
accidents among the volunteer group than the 
nonvolunteers, but this difference was not 
statistically stable because of the low number 
of cases. 

5. There was no significant difference be- 
tween the groups for attrition due to flight 
failure. 


It is evident from these results that far 
from being seriously disturbed, the volunteers 
for these hazardous projects were actually 
superior in many respects to the nonvolun- 
‘ teers. This was particularly true insofar as the 
desire to complete flight training was con- 
cerned. The volunteers also excelled in leader- 
ship, as indicated by the fact that more of 
them were selected as preflight student class 
officers than nonvolunteers, and by their 


higher officer-like-quality grades. Although we 
have to be cautious in generalizing to othe 
volunteer groups, it is possible that because 
naval aviation cadets are carefully screened 
before acceptance into the Naval Flight 
Training Program, there are few maladjusted 
individuals among them. Therefore, when ask- 
ing for volunteers from this group of men we 
were appealing to those who were most 
strongly interested in the training program in 
general and not maladjusted, and they dem- 
onstrated this interest by cooperating in these 
research projects. 

However, it that one of the 
most important findings from this research 
that you influence the amount of 
volunteering by manipulating the experimen- 
tal conditions for volunteering. This appears 
to be a fruitful area for further research which 
may have important implications for future 
hazardous military missions. 

Another conclusion seems warranted. From 
the results of the present project and previous 
investigations, many research are 
vulnerable to “volunteer bias.” Because vol- 
unteers do differ significantly from nonvolun- 
teers on many characteristics, caution must be 
exercised in generalizing from results obtained 
from a volunteer sample to a parent popu- 
lation. 


seems to us 


was can 


types of 


REFERENCES 

Ecrert, R. L., Meeranp, T., Cringe, U. B., Forcy, 
E. W., Sprckier, M. W., & Brown, C. Fighter 1: 
An analysis of combat fighters and non-fighters 
Hum. Resour. Res. Off. tech. Rep., 1957, No. 44. 

Lasocna, L., & Von Fetsincer, J. M. The volunteer 
subject in research. Science, 1954, 120, 359-360. 

Mastow, A. H., & Saxapa, J. M. Volunteer error in 
the Kinsey study. J. abnorm. soc. Psychol., 1952, 
47, 259-262. 

Riccs, M. M., & Kaess, W. Personality differences 
between volunteers and nonvolunteers. J. Psychol, 
1955, 40, 229-245 

Rosenpaum, M., & Brake, R. R. Volunteering as a 
function of field structure. J. Psy- 
chol., 1955, 50, 193-196 

Scuacuter, S., & Hatt, R. Group derived restraints 
and audience persuasion. Hum. Relat., 1952, 5, 
397-406. 


Received December 19, 1959) 


abnorm. soc 





Journal 
19¢ V 


of Applied Psychology 
ol. 44, No. 5, 332 5 


335 


PERSONALITY CHARACTERISTICS OF ENGINEERS 
AS MEASURED BY THE EDWARDS PERSONAL 
PREFERENCE SCHEDULE 


CARROLL E 


IZARD 


Vanderbilt University 


For a number of years this country has 
been faced with the problem of educating a 
sufficient number of engineers to meet the 
challenge of national defense, the conquest of 
space, and a number of other scientific and 
technological frontiers. There are problems 
both in recruiting appropriate talent for the 
engineering schools and in effective utilization 
of the individuals once they have become 
qualified engineers. Psychology may make a 
contribution here by efforts to increase our 
knowledge of the personality correlates of 
success in engineering. One approach along 
this line is to determine the effectiveness of 
practicable personality assessment devices for 
differentiating engineers and nonengineers and 
to relate these results to meaningful con- 
structs in personality theory. 

In a recent article, Harrison, Tomblen, and 
Jackson (1955) reported the results of an in- 
tensive study of the personalities of 100 
mechanical engineers. These investigators 
utilized projective techniques, a personality 
inventory, and a clinical interview to obtain 
their data. In general, their findings were con- 
firmed by the present author using an inten- 
sive clinical interview and a battery of objec- 
tive tests. It appeared, however, that most of 
the personality variables uncovered in the two 
projects were effectively measured by one of 
the objective tests used by the present inves- 
tigator. This was Edwards’ Personal Prefer- 
ence Schedule (PPS). 

The findings from other studies utilizing 
objective measures are not entirely consistent. 
Goodman (1942) utilized the Bernreuter to 
show that engineering students were less neu- 
rotic and more self-sufficient than liberal arts 
students. However, Blum (1947) found no 
difference between engineering and nonengi- 
neering students on the MMPI. 

In the present study, the efficacy of the 
PPS for measuring personality characteristics 


3? 


of engineers will be evaluated by comparing 
graduate engineers with Edwards’ male norm 
group and by comparing a freshman class of 
engineering students with a freshman group 
of liberal arts students within the same uni- 
versity. Further, this study will compare the 
average personality profile for engineers ob- 
tained on the PPS with that obtained by 
Harrison, Tomblen, and Jackson utilizing pro- 
jective techniques, a personality inventory, 
and the interview. 


PROCEDURE 


Subjects. The experienced engineers were all those 
below the level of subsection manager in the engi- 
neering section of a department of the General Elec- 
tric Company. Their age ranged from 23 to 49, with 
a mean of 33.39 years. Seventy-eight of the 81 had 
degrees from accredited engineering schools. Most of 
them were easterners who had attended eastern 
Sixty-three were mechanical engineers, 9 
electrical, 2 aeronautical, 2 industrial, and 1 chemical. 
Four held nonengineering degrees: 2 in physics, 1 in 
metallurgy, and 1 in chemistry. These 4 were in- 
cluded since their work experience and present func- 
tions were in engineering. Nineteen of the 81 held 
graduate or professional degrees: 9 the MSME, 1 the 
MSAE, 1 the MSCE, 2 the MS (Physics), 2 the 
MBA, and 4 the LLB. This group was compared 
with the 750 liberal arts students which constitute 
Edwards’ male norm group (Edwards, 1954). 

In addition to the above sample of General Elec- 
tric engineers, a group of beginning engineering stu- 
dents was evaluated. All students who entered Van- 
derbilt University as undergraduates in 1957 were 
given the PPS. The 173 students who were still in 
the engineering school at the end of their first 
semester (28 had withdrawn or failed) were com- 
pared with a randomly selected group of 173 men in 
the college of arts and science. Since only about 55% 
of an entering class of engineers at Vanderbilt com- 
plete an engineering degree, this second semester 
group is far from being a “pure” group of engineers. 

Test. The PPS is a forced-choice personality in- 
ventory which measures 15 manifest needs or person- 
ality characteristics. A separate key yields a measure 
of internal consistency. The names of the personality 
characteristics and much of the content of the in- 
ventory were taken from Murray (1938). Each item 
of the test is a forced choice between a pair of state- 


schools 
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ments which are matched in terms of social desir- 
ability. The average individual requires about 50 
minutes to complete the inventory. 

The PPS was given to the engineers in groups of 
eight to ten as a part of the test battery utilized in 
a manpower development program. The engineers 
were told that the test and interview data would be 
utilized to help them in their personal and profes- 
sional development. Their cooperation 
the tests and interviews was excellent 

Analysis of Data. The average profile for engineers 
was compared with that of Edwards’ male norm 
group by an analysis of variance technique. This is 
an extension of an analysis of variance technique for 
comparing personality profiles of pairs (Izard, 1960) 
In addition, the engineers’ mean scores on the 15 
PPS characteristics were compared with Edwards’ 
norm data by means of ¢ tests. The two freshmen 
groups were analyzed in the same way 

It should be noted that the t tests were considered 
only a rough guide as to the significance of the dif- 


ference between 


throughout 


a given pair of means. While any 
particular ¢ can be interpreted in the usual way, the 
This is be 
PPS characteristics are inter 


t’s as a series are not independent tests 
cause the 15 orrelated 
and the scores for an individual or group sum to a 
constant. This violation of the assumptions underly- 
ing the t test may not be of much practical conse 
quence—the intercorrelations between characteristics 
are low, and an extremely high or low score on a 
given characteristic can affect the 


characteristics 


scores on all other 
equally 


RESULTS 


The analysis of variance comparing the 
PPS profiles of General Electric engineers and 
Edwards’ male norm group is presented in 
Table 1. The highly significant F ratio indi- 
cates that the average profiles of the two 
groups are substantially different. The vari- 
ances pooled for the ¢ tests were considered 
acceptably homogeneous for 12 of the 15 vari- 
ables by simple F test. A p of .01 was set as 
the appropriate level of rejection, since the 
smaller variance was arbitrarily designated as 
the denominator of the F ratio and since 15 
such tests were required. A ¢ for samples with 


rABLE 1 
ANALYSIS OF VARIANCE COMPARING THI 
PROFILES OF EXPERIENCED ENGINEERS AND 
Epwarps’ MALE Norms 


C;ROUP 


Source df WSs 


Between Variables 14 4509.46 
Variables x Groups 14 301 41 
Variables X Individuals 11746 22.59 


13.34 <.005 


TABLE 2 
VARIANCE COMPARING THI 
ENGINEERING AND LIBERAI 
FRESHMEN 


ANALYSIS OF 
PROFILES OF 


GROUP 
ARTS 


Sourc e 


Between Variables 14 
Variables X Groups 14 
Variables X Individuals 4816 


unequal variances (Bliss & Calhoun, 1954, 
pp. 84-85) was utilized for the other three 
variables: Deference, Succorance, Endurance. 
Considering the ¢’s as independent tests, engi- 
neers have significantly greater means on 
Achievement, Deference, Order, Dominance, 
and Endurance and _ significantly smaller 
Affiliation, Intraception, Succor- 
ance, Abasement, Nurturance, and Heterosex- 
uality. All of these would be significant at the 
001 level Intraception and Hetero- 
sexuality which were at the .02 level. 


means on 


except 


The analysis of variance comparing the 
freshmen in the school of engineering to male 
freshmen in the college of arts and science is 
presented in Table 2. The significant F indi- 
cates that the average profiles of the two 
groups of students are considerably different. 
Ten of the mean differences on the 15 sepa- 
rate variables were in the same direction as 
those between engineers and Ed- 
wards’ norm group. The ¢t’s comparing the 
variable means for the student groups ex- 
ceeded the value normally required for one- 
tailed test of significance at the .005 level for 
Order, Endurance, and Intraception; at the 
.05 level for Nurturance (¢ modified for un- 
equal variances), and at about the .06 level 
for Affiliation. Only two of the four mean 
differences which reversed signs were of con- 
siderable Dominance and Aggression 
means for liberal arts students were higher 
than those for engineers. The ¢’s considered in 
the usual way would be significant at the .05 
and .01 levels, respectively. 


graduate 


size 


DISCUSSION 


The PPS differentiated experienced engi- 
neers and the liberal arts students of Ed- 
wards’ norm group quite effectively. This 
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finding was substantiated by the fact that the 
PPS also differentiated engineering freshmen 
from liberal arts freshmen. The differences be- 
tween the freshmen groups were not as great 
as those between experienced engineers and 
Edwards’ norms. This was expected since 
second semester engineering freshmen were 
less than 65% “pure” engineers, judging from 
the usual attrition rates of the school. Only 
for Dominance and Aggression were the mean 
differences between the student groups signif- 
icant in the opposite direction to that for the 
other groups. These reversals are not incon- 
sistent with the observation of Harrison, 
Tomblen, and Jackson that engineers become 
more confident and show less social reticence 
with increasing maturity and experience. 

The difference between experienced engi- 
neers and Edwards’ norm group on Heterosex- 
uality may be more a function of differences 
in age and marital status than personality. 
The engineers were somewhat older and sex- 
ual activity is known to decrease with age. 
Further, items on this scale are framed largely 
in terms of premarital social relations with 
the. opposite sex. Most of the experienced 
engineers had been married for several years 
while most of Edwards’ college norm group 
presumably were unmarried. 

The findings of the present study agree 
rather closely with those of other investigators 
utilizing quite different personality assessment 
techniques (Harrison et al., 1955; Moore & 
Levy, 1951; Steiner, 1953). As indicated 
earlier, Harrison, Tomblen, and Jackson uti- 
lized projective techniques, objective tests, 
and interview in a study of 100 mechanical 
engineers. For comparative purposes, the find- 
ings of the present study in terms of relative 
standing on PPS scales are placed alongside 
the corresponding findings of Harrison, Tomb- 
len, and Jackson: high Achievement—great 
involvement with work and a striving for 
achievement; low Deference—relies heavily 
on authority for settling of issues; high Order 

fondness for structure and order, aversion 
to ambiguity; low Affiliation—social partic- 
ipation based on conventionality and social 
conformity rather than any profound interest 
in people; low Intraception—analytical inter- 
est in people rare, avoidance of introspection 
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and self-examination; low Succorance—self- 
sufficient; high Dominance—decisive, tough- 
minded, direct, masculine; 
low Nurturance—impersonal, prefer objects 
and processes to people; high Endurance- 
energetic, goal oriented, conscientious; low 
Abasement—authoritarian approach, nonin- 
trospective, impersonal. The degree of congru- 
ence between the two studies is striking, 
particularly in view of the difference in per- 
sonality measurement devices. The amount of 
agreement is particularly encouraging in view 
of the fact that the present findings were 
based on a single objective inventory which 
can be group administered in approximately 
one hour. The PPS has the further advantage 
of utilizing the forced-choice technique to re- 
duce the effects of the social desirability of 
items on test taking (Edwards, 1954; Silver- 
man, 1957). 


straightforward, 


The present findings as well as those of 
previous investigators in this area suggest that 
engineers, as compared with other individuals, 
invest or express relatively more positive af- 
fect in relation to objects and processes and 
less in relation to people. The engineers were 
high on Achievement, Order, and Endurance 

the three PPS characteristics which in 
terms of item content appear most relevant 
to affective investment in objects, tasks, or 
processes. They were low in Affiliation, Intra- 
ception, and Nurturance—the three PPS char- 
acteristics which appear to be most relevant 
to the establishing of satisfying affective ties 
between people. This might be viewed as a 
difference in characteristic ways of expressing 
and responding to affect or in affective orien- 
tation—in this case, predominant object af- 
fect as opposed to predominant interpersonal 
affect. Evidence for the meaningfulness of 
these concepts as 
personality theory 
(Izard, 1960). 


explanatory constructs in 


was presented earlier 


SUMMARY 
The average personality (PPS) profile of 
81 experienced engineers and that of the 750 
male liberal arts students in Edwards’ norm 
group were found to be significantly different 


by an analysis of variance technique. There 
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were substantial and apparently meaningful 
differences on 10 of the 15 PPS scales. The 
engineers were higher on Achievement, Defer- 
ence, Order, Dominance, and Endurance and 
lower on Affiliation, Intraception, Succorance, 
Abasement, and Nurturance. A similar analy- 
sis of variance comparing 173 engineering 
freshmen with 173 liberal arts freshmen also 
vielded a significant F. Ten of the differences 
between means on the 15 characteristics were 
in the same direction as for the experienced 
engineers and the norm group. Four of these 
were of considerable magnitude: Order, En- 
durance, Intraception, Nurturance. 

These findings were consistent with those 
of Harrison, Tomblen, and Jackson, who uti- 
lized projective techniques, a personality in- 
ventory, and a clinical interview. The results 
pointed to the plausibility of affective orienta- 
tion—characteristic ways of expressing or re- 


sponding to positive affect—as a theoretical 
construct with some general utility in explain- 


ing personality differences such as those found 
between engineers and nonengineers. 
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Among the factors instrumental in main- 
taining a high level of detection performance 
during a vigilance task are knowledge of re- 
sults (Baker, 1959a; Mackworth, 1950), high 
signal frequency (Deese, 1955), intersignal 
regularity (Baker, 1959a), and the known 
presence of the experimenter (Fraser, 1953). 
With respect to the latter presumably the E 
is judged to be a peer figure who “monitors 
the monitor.” 

In a typical operational or industrial situa- 
tion these factors would operate rarely, if at 
all. If the radar operator does not detect the 
target, who is there to inform him that he has 
missed? Similarly, frequency and regularity 
are beyond system control. 

It has been proposed (Baker, 1960) that 
these factors can be brought into play, how- 
ever, by use of “artificial” signals which would 
not be discriminably different from ‘real’ 
signals. Knowledge of results could be given 
when an artificial signal detected, or 
missed. Such artificial signals would increase 
the apparent signal frequency and could be 
programed to reduce unusually long intervals 
between real signals, i.e., would introduce 
greater temporal regularity. As the knowledge 
of results would be given by a senior person 
the advantage would obtain of having a peer 
figure “monitoring the monitor.” 


was 


MerTHOD 


Subjects (Ss) seated in a cubicle faced a vertical 
4-in. square screen of ground glass. Viewing distance 
was 20 in. The signal, a 2 mm. dot of light lasting 
always appeared in one of the 4 corners of 
the square (14 in. along the diagonal), in random 
order of sequence 

Two conditions were compared. The control condi 
tion consisted of the familiar Mackworth sequence of 
signals (Mackworth, 1950) involving intersignal in- 
tervals of %4, %, 1%, 2, 2, 1, 5, 1, 1, 2, 3, and 10 min. 
each \% hr. These intervals were employed in a dif- 
ferent random order 


0.6 sec., 


for each \% hr. and for each S. 
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As the task lasted 1% hr., 
36 signals. 

For the experimental condition intersignal inter- 
vals for the artificial signals were arbitrarily selected 
to be 244, 1%, 144, 2, and 2% min. and were injected 
in that order as many times as necessary to fill in the 
temporal gaps between the real signal sequence used 
in the control condition. Specifically, each S in the 
experimental group was assigned a sequence of real 
signals which duplicated that of an S in the control 
group. For example, if the real sequence for an S 
began 114, 31%4, 8% min., an artificial signal was in- 
jected at 6 min. (214 min. after the real signal at 
3% min.) and at 734 min. (1% min. after the arti- 
ficial signal at 6 min.). An S whose schedule would 
not present a real signal until 10 min. of the task had 
elapsed, would receive artificial signals at 214, 4%, 
534, and 74 min. Whenever the program scheduled 
an artificial and a real signal simultaneously, the real 
signal only presented. Signal frequency, real 
plus artificial, ranged from 58 to 63 over the 1%4-hr. 
period. See Fig. 1 

The S’s task was to 


signal was detected. 


each S was presented with 


was 


press a button whenever a 








Fic. 1. Schedule of “real” and of “artificial” sig- 
nals. (Above baseline, the Mackworth schedule of 12 
real signals which constituted a half-hour schedule 
for one S in the control condition. Below the base- 
line, 8 artificial signals. These artificial plus the real 
signals constituted a half-hour schedule for one S in 
the experimental condition. Values are in minutes.) 


2In a practical radar situation, the simultaneous 
occurrence of a 
would: not constitute a problem 
if either were detected, it 
in half the cases. In the case of the artificial signal 
being the one detected, knowledge of results would 
be given and the signal disappear, leaving the real 


“artificial” signal 
In such a situation, 
would be the real signal 


“real” and an 


signal on the display. A second point concerns the 
possibility of randomly injecting a predetermined 
number of artificial signals. In the case of the present 
experiment the probability of simultaneity of occur 
rence of a real and an artificial signal would be 
approximately 1 in 100,000. In such a situation, how- 
ever, one must be prepared to accept less intersignal 
regularity. 
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CONDITION USING 
ARTIFICAL SIGNALS 
PLUS KNOWLEDGE 

OF RESULTS (N*25) 


ORIGINAL MACK WORTH 
DATA USING KNOWLEDGE | 
OF RESULTS. (N+ 25 


NOT REPORTED 


ORIGINAL MACK WORTH 
CONTROL OATA (Ne25) 


PERCENTAGE OF REAL’ SIGNALS 


CONTROL 

CONDITION 

(we25 
THIRO 


HALF -HOUR 


Fic. 2. Percentage of signals not reported during 
three successive half-hour intervals for two condi- 
tions. (The original Mackworth data have 
plotted for comparison.) 


} 
peen 


Through headphones (worn by Ss in both groups), 
knowledge of results was given by the E to Ss in the 
experimental group only. For the artificial signals 
this knowledge took the form of a statement of 
“correct,” “missed one,” or “false,” as appropriate 
For the real signals the knowledge given was re 
stricted to a statement of correct when a real signal 
was detected. The E made no response when a real 
signal was missed or when one was falsely reported 

Ss were 50 paid females, 25 
domly to each condition. 


being assigned ran 


RESULTS 

Of the 900 real signals presented in each 
condition, the experimental group failed to 
report 92, while the control group failed to 
report 230. This difference is significant at the 
1% level of confidence. 

The manner in which the unreported real 
signals were distributed in time is shown in 
Figure 2 in terms of the percentage unre- 
ported of those presented each half-hour. The 
original Mackworth data for both his control 
and knowledge of results conditions have been 
plotted for comparison purposes. 

From Figure 2 it is apparent that the con- 
trol data are very similar to Mackworth’s, i.e., 
the major decrement occurred in the second 
half-hour. In the experimental condition, how- 
ever, there was no decrement in the second 
half-hour, though a downward trend is ap- 
parent in the third. Nevertheless, Ss in the 
experimental condition reported a larger per- 
centage of signals in their third (poorest ) 
half-hour than did the control group in their 
first (best) half-hour. It can be noted, too, 
that performance in the experimental condi- 
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tion was superior to: that of Mackworth’s 
knowledge of results condition, 10.2 and 
17.3% of the signals presented being unre- 
ported, respectively. Unfortunately this dif- 
ference cannot be statistically tested. 

The data were transformed to radians and 
a variance analysis was made. The analysis 
indicated that for both groups more targets 
were unreported as a function of time, the 
relation being more marked for the control 
group. 


DISCUSSION 


The purpose of this study was not to deter- 
mine the relative merits of increased signal 
frequency using artificial signals, versus 
knowledge of results in maintaining the level 
of vigilance, but rather to demonstrate that 
when both factors are employed monitoring 
performance is superior to the case when 
neither is employed. The one reported at- 
tempt (Garvey, Taylor, & Newlin, 1959) to 
determine the merits of these two 
factors found that probability of detection 


relative 


can be “improved greatly by inserting arti- 
ficial signals,” while the increment in im- 
provement credited to “knowledge relative to 
how well he is monitoring the artificial signals 
| was] only slight.” In the experiment quoted, 
however, such knowledge was an inference 
made on the part of the observer from changes 
in the display. Other work (Baker, 1959b; 
Mackworth, 1950) has shown knowledge of 
results to be a potent factor in situations 
where signal frequency was not increased be- 
yond that for a control group. In any event, 
an extensive literature (Ammons, 1956) in 
the area of knowledge of results suggests that 
the efficacy of such knowledge is dependent 
upon the form of the knowledge, and in the 
experiment reported here we have attempted 
to put the knowledge in a form which might 
be employed in a practical situation. 

A second point concerns the nature of the 
artificial signals employed: need they be in- 
distinguishable from real ones? In one study 
(Wallis, 1958) they were deliberately de- 
signed to be discriminably different from real 
signals, and as such were found to be of 
negligible value. In the study referred to 
above (Garvey et al., 1959), however, it was 
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found that “detection performance did not 
suffer when a clear distinction was made be- 
tween real and artificial signals.” In the pres- 
ent study the signals were purposely made 
indistinguishable. The question remains un- 
answered. 

The major point in the present paper, how- 
ever, is that when signal frequency is in- 
creased by addition of artificial signals, and 
knowledge of results is given with respect to 
the artificial signals and to the real signals 
reported, vigilance is significantly superior to 
that of a control group. The program of the 
artificial signals in the experimental condi- 
tion was quite arbitrary. It appears not un- 
likely that determination of optima with re- 
spect to ratio of real to artificial signals, i.e., 
optimum signal frequency, degree of temporal 
regularity of artificial signals, form of knowl- 
edge of results, etc., would result in yet supe- 
rior performance in a vigilance task. 
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SUPERVISORY RATINGS AND ATTITUDES 
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This report concerns a phenomenon which 
we have noticed often in supervisory ratings 
and attitudes but have only recently quanti- 
fied. We see this as a, probably unconscious, 
search by the supervisor for more and more 
workers to promote, regardless of merit. We 
have interpreted it as an attempt by super- 
visors to escape or share their unique respon- 
sibilities and to follow the contemporary trend 
of major emphasis upon managerial ability. 
The Wrigley, Cherry, Lee, and McQuitty 
study (1957) of the ratings of aircraft me- 
chanics provided us with a device for objec- 
tive study in this area. Specifically, we hy- 
pothesized that: (a) the relative values the 
supervisors placed upon the 10 Wrigley et al 
factors would be shown by the difference be- 
tween the factor scores of superior and infe- 
rior groups of maintenance workers, and (5) 
how well the maintenance men met the stand- 
ards of the supervisors would be shown by 
their factor ratings. 


PROCEDURES ! 


1. Forty-two male NP hospital housekeeping main- 
tenance men (janitors) were ranked by 
male supervisors from 1, the most satisfactory 
worker, to 42, the least. The averages of these rank- 
ings were reasonably stable. The rho’s between the 
rankings of the various supervisors and the averages 
of the others were: .72, .85, .77, and .81 


their four 


2. Next, the four supervisors and the division chief 


rated the 42 men on the 120 items of the Wrigley 
et al. scale. (Some minor rewordings were necessary 
to make all the items applicable to housekeeping 
maintenance men.) The items reported as correlating 
positively with the criterion (descriptive of the best 
aircraft mechanics) were scored +1; those correlat- 
ing negatively (descriptive of the poorest mechanics) 
were scored — 1. Algebraic totals for each of the 10 
factors were turned into percentages of the highest 
cores possible. 

3. Two extreme groups of 
from the rankings made by 


workers were chosen 


the supervisors. These 


1 Zachery Mayberger, psychology trainee, Univer- 
sity of Tennessee, assisted in collecting the data 
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Administration Center, Hot Springs, South Dakota 
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were six of the most satisfactory employees who had 
no ranking by any supervisor as low as the average 
of the entire group, and seven of the least satisfac 
tory who had no ranking by any supervisor as high 
as the average of the entire group. (In addition, there 
was no overlapping of these groups on 5-point scale 
ratings or on total scores on the Wrigley et al. scale.) 
The rigid, “no overlapping” criterion accounted for 
the small number of cases. The likelihood that these 
two extreme groups were drawn from the same 
population (judged by average ranking) was less 
than .001 by the Mann-Whitney Test 

4. The individual 10 factor scores of the most- and 
least-satisfactory groups were compared with each 
other and with the scores of the entire group, includ- 
ing them. The four supervisors and the division chief 
agreed very closely in their factor ratings of the men 
Rho’s between their for the 
total group ranged from 96) 


average factor ratings 


95 to .97 (Average 
RESULTS 


Both hypotheses (see first paragraph above) 
were supported by the data. Mean factor 
scores for the six superior, the seven inferior, 
and the total group of 42 maintenance men 
are given in Table 1. The factors are listed in 
order of size of the difference between means 
of the superior and inferior groups. We think 
that this order shows the relative importance 
of these 10 factors as seen by the supervisors. 

Support for our second hypothesis is shown 
by a rho of — .92 between the mean factor 
scores of the total group and the order of 
importance of those factors as evidenced by 
the ratings of the superior and inferior groups. 
This near-perfect inverse relationship indi- 
cates that the abilities valued most by the 
supervisors were least alundant in the work- 
ers, and vice versa. 


SUMMARY AND DISCUSSION 


The rankings and ratings showed that the 
supervisors of our hospital maintenance men 
valued most the abilities least prevalent in the 
men (leadership and executive ability) and 
valued least the attributes most abundant in 
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TABLE 1 


SUPERVISORS’ RATINGS OF HOUSEKEEPING MAINTENANCE MEN 


Mean Mean Mean 
Superior Inferior Mean Total 
Factor No. Description Group Group Difference Group* 


Leadership 97.1 16.1 81.0 53.0 
Executive ability 98.8 21.0 77.8 57.9 
General job efficiency 98.7 25.9 72.8 71.2 
Ability to motivate others a 29.7 69.6 64.4 
Willingness and adaptability 43.9 55.8 77.2 
Resourcefulness . 41.1 51.6 64.5 
Mechanical proficiency 48.4 51.3 79.7 
Orderliness 98.3 56.9 41.9 79.6 
Social adjustment 98.5 70.6 27.9 86.7 
Personal charm 97.8 72.0 25.8 86.2 


® Total means differ at the .02 level or better except between Factors 4 and 3, 1 and 9, and 10 and 8 


these men (social adjustment and personal visors in other areas though the actual factors 
charm). This position is so unrealistic that it involved may be differently ordered in dif- 
requires no elaboration. Obviously, much ex- ferent occupations. 

ecutive and/or leadership ability would be a 
source of frustration and job dissatisfaction REFERENCE 


to most of the maintenance men. But good Wkricrey, C., Cuerry, C. N., Lez, M. C., & Mc- 
Quitty, L. L. Use of the square root method to 
identify factors in the job performance of aircraft 
be essential to such workmen in teams and mechanics. Psychol. Monogr., 1957, 71(1, Whole 


especially around mental patients. We think No. 430). 


social adjustment and personal charm should 


that similar biases color the thinking of super- (Received December 31. 1959) 
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The prediction of success or failure in col- 
lege is typically based on some measure’ or 
sets of measures related to intelligence of the 
student—high school grade-point average, an 
achievement test, or a standardized intelli- 
gence test. It seems reasonable to assume, 
however, that a student with serious emo- 
tional difficulties would be less likely to suc- 
ceed than a student without such difficulties, 
intellectual status being equal. Accordingly, 
the ability to identify the potentially mal- 
adjusted student should increase efficiency of 
prediction of academic success, an especially 
important goal at a time when increasing 
college attendance and competition for admis- 
sion are anticipated. Furthermore, early iden- 
tification of maladjustment may allow for 
more effective remediation or guidance through 
university counseling facilities. 

One avenue toward the identification of po- 
tential maladjustment is to investigate per- 
sonality differences between groups of college 
students clearly varying in adjustment level. 
Whatever personality pattern appears best to 
postdict emotional maladjustment can then be 
evaluated as to its predictive usefulness. How- 
ever, there seem to be surprisingly few studies 
dealing with personality differences between 
college students differing in level of adjust- 
ment. Most of the investigative work in this 
area has utilized the Minnesota Multiphasic 
Personality Inventory as the measuring in- 
strument, but for varying reasons these stud- 
ies have made somewhat limited contributions 
to clarifying this particular problem. Mello 
and Guthrie (1958) concluded that malad- 
justed college males “admit to more symptoms 
of anxiety and inferiority” and females tend 
“to be disturbed about interpersonal relation- 
ships” based on MMPI performance, but no 
comparison with a control group of adjusted 
students was made. Sloan and Pierce-Jones 
(1958) compared counseling center clients 
with college normals, but the obtained differ- 
ences on the MMPI were based in some in- 
stances on so few cases (e.g., two) that the 


stability of many of the findings can be seri- 
ously questioned. The investigations of Black 
(1956) and Drake (1954) have limited rel- 
evance to the problem of personality differ- 
ences in adjusted and maladjusted college 
students. The former was intended to clarify 
the meaning of MMPI scores in normal col- 
lege Ss and thus used no maladjusted group, 
while the latter studied specified subgroups 
within a larger maladjusted group and made 
no comparisons with normals. 

The study most directly related to the pres- 
ent one was reported by Merrill and Heathers 
(1954) who compared self-descriptions on a 
58-item list of adjectives of a group of coun- 
seling center clients “showing some emotional 
disturbance” with another group of clients 
who did not. Although several adjectives were 
differentially endorsed by these two groups, 
the investigators cautiously limited themselves 
to the broad conclusion that the more poorly 
adjusted group “marked more items reflecting 
dissatisfaction with themselves in their social 
and personal relationships.” Accordingly, no 
specific inferences about personality differ- 
ences were provided. 

The purpose of the present study was to 
evaluate the personality differences between 
emotionally maladjusted and adjusted college 
students by two methods: comparing Need 
scale scores on the Gough Adjective Check 
List (1955) for groups of students differing in 
level of adjustment, and using adjustment 
ratings of personality traits provided by ex- 
pert judges. 


METHOD 


Need scales. The Gough Adjective Check List 

ACL) includes 300 adjectives and involves asking S 
to endorse those which he feels are self-descriptive 
Heilbrun (1958, 1959) has developed 15 Need scales 
for the ACL using manifest needs originally described 
by Murray (1938) and further elaborated in the con- 
struction of the Personal Preference Schedule (Ed- 
wards, 1954). The adjectives on the ACL judged to 
indicate the presence of each need and those judged 
to contraindicate the presence of each need were 
combined to form adjective clusters. Each need score 
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tained 

dicative adjectives trom 

adjectives endorsed by S 
The Need scales tend to 


ibstracting the number of 


the 


contrain- 
number of indicative 
how moderate intercorre 
lations, probably in large measure due to overlapping 
adjectives which judges rated as indicative or con- 
traindicative of the same need trait or indicative of 
one and contraindicative of another in the original 
derivation of The median absolute inter- 
correlation for male college students is .43 and for 
female college students is .30. Test-retest reliabilities 
for the 15 scales over a 244-mo. period range from 
.71 (Succorance) to .90 (Nurturance) for a combined 
male-female college group (N = 100). Total time to 
complete the ACL averages about 10 min. and the 
completely objective scoring of the Need scales en- 
tails about 15 min. The names of the traits measured 
by the scales are given in Table 1. 

Subjects. For purposes of this study, the malad- 
justed group is defined as those college Ss who sought 
help at the University Counseling Service for prob- 
lems of a personal nature. Fifty male Ss (mean age 
= 22.86 yr.) and 50 female Ss (mean age = 21.08 
yr.) were included in this group, these Ss represent- 
ing a consecutive run of personal cases seen in the 
service. The test was administered individually as a 
part of a standard intake battery. 

The adjusted group included Ss obtained from 
various undergraduate psychology courses and may 
be regarded as a fairly representative sample from 
the general university undergraduate population. This 
group was made up of 97 male Ss (mean age = 21.43 
yr.) and 109 female Ss 20.35 yr.). The 


the scales 


(mean age 


ACL was group administered to the adjusted S 
under somewhat impersonal research conditions 

A second control group, to be referred to as tli 
adjusted counseling group, was made up of 50 males 
(mean age = 21.54 yr.) and females (mean 

19.72 yr.) who sought help in the University 
Counseling Service for problems of a vocational or 
educational nature, but not for personal adjustment 
problems. Since these Ss did not seek help for per- 
sonal problems, it seems safe to conclude that as a 
group they are less maladjusted than Ss in the mal- 
adjusted group. The second control group was neces- 
sary because the primary comparison groups, the 
adjusted and maladjusted, differed in two possibly 
important respects other than along the adjustment 
continuum. First, they were given the ACL under 
different testing conditions, and, second, the malad- 
justed Ss had all sought help at a service agency. The 
addition of the adjusted counseling group, the Ss of 
which had sought help and were given the test under 
the same conditions as the maladjusted group, pro- 
vided an opportunity 
these two factors. 

Adjustment ratings. Twenty-four psychologists on 
the faculty of the State University of Iowa (with an 
average of over nine staff years in a college setting) 
acted as judges. Their task was to judge whether a 
college undergraduate showing more of each trait be 
havior measured by the ACL Need scales would 
likely be: more poorly adjusted, neither more poorly 
nor better adjusted, or better adjusted than a student 
showing less of the trait behavior. The judges were 
cautioned not to consider behavioral 
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their judgments but rather to make their compar- 
isons between persons “fairly strong” on a trait and 
persons “fairly weak” on the same trait. Specific be- 
haviors characterizing each need, obtained from Ed- 
wards (1954), were provided for the judges. Each 
need was rated independently of the other needs and 
separate judgments were made with to 
male and female college Ss 

Adjustment values were assigned to the need be- 
haviors by giving the following weights to the judges’ 
ratings: poorer adjustment = 1, neither poorer nor 
better = 2, better adjustment = 3. By pooling ratings 
over the 24 judges for each need X sex category and 
obtaining a mean judgment, a final adjustment value 
was determined. It was decided a priori to define any 
need with a final adjustment value falling between 
1.00 and 1.74 as maladjustive, those with these values 
falling between 2.26 and 3.00 as adjustive, and those 
needs with values in the 1.75 to 2.25 range as neither 
adjustive nor maladjustive. 


reference 


RESULTS 


Table 1 presents the final adjustment need 
values, the Need scale mean scores and stand- 
ard deviations, and the results of ¢ tests for 
the male adjusted, maladjusted, and adjusted 
counseling group comparisons. Inspection of 
the ¢ values for the adjusted-maladjusted 
group comparisons shows that the ACL Need 
scales provided statistically reliable differ- 
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ences for 9 of the 15 traits. The judges’ rat- 
ings of these needs defined 10 of the 15 as 
adjustive or maladjustive in a male college stu- 
dent and showed strikingly consistent agree- 
ment with the test findings. Thus by both 
criteria of adjustment, the maladjusted col- 
lege male should be characterized by lesser 
achievement, order, affiliation, dominance, 
nurturance, and endurance needs and greater 
succorance, abasement, and aggression needs 
than his adjusted counterpart. The judges 
rated the heterosexual need as being posi- 
tively related to adjustment and the adjusted 
group was appropriately higher on the Hetero- 
sexuality scale than the maladjusted group, 
the difference approaching significance at the 
.06 level of confidence. No significant Need 
scale differences were found on the traits de- 
fined as neither adjustive nor maladjustive— 
deference, exhibition, autonomy, intraception, 
and change. Accordingly, there was consistent 
agreement as to which need behaviors should 
not differentially characterize adjusted and 
maladjusted male college students as well. 
Table 2 presents the final pooled judgments 


and the relevant Need scale data for the 
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adjusted, maladjusted, and adjusted counsel- 
ing females. It can be seen that 7 of the 15 
Need scales provide significant differences be- 
tween female adjusted and maladjusted Ss. 
The judges rated 12 of the 15 needs as having 
adjustive or maladjustive significance. There 
vas agreement between the two criteria with 
regard to five needs—greater order, nurtur- 
ance, and endurance as adjustive, and greater 
autonomy and abasement as maladjustive. 
The adjusted and maladjusted female groups 
differed reliably on two scales which the 
judges considered as having no adjustive sig- 
nificance (i.e., maladjusted Ss higher on Suc- 
corance and Change). On the other hand, no 
significant test differences were found for 
seven need variables judged to be adjustiv. 
(achievement, deference, affiliation, intracep- 
tion, and heterosexuality) or maladjustive 
(exhibition and aggression). 

Since the maladjusted and adjusted Ss dif- 
fered not only in level of adjustment but also 
in having sought help or not and in being 
individually vs. group tested, the importance 
of these differences in contributing to the 
differeitial Need scale patterns required fur- 
ther analysis. The scale scores of adjusted 
counseling Ss were compared with those of the 
maladjusted Ss, these groups being matched 
on the seeking help and mode of testing var- 
iables but also differing in level of personal 
adjustment. Inspection of the ¢ values in the 
last columns of Tables 1 and 2 shows that the 
differences between adjusted counseling and 
maladjusted Ss are similar to those found 
between adjusted and maladjusted Ss. This 
clearly suggests that level of adjustment was 
the major determinant of the test differences 
in the adjusted-maladjusted analysis. One 
further analysis was performed which bears 
upon the assumption that adjusted counseling 
Ss approach adjusted Ss in adequacy of per- 
sonal adjustment. The Need scale scores for 
these two groups were compared and none of 
the differences for either males or females 
reached the 5% level of significance. 

Since the scales on the ACL are not inde- 
pendent, the question can be raised as to 
whether the obtained test differences from the 
adjusted vs. maladjusted male and female 
comparisons cannot be accounted for by a 
cluster of highly intercorrelated scales. To 
test for this possibility, the median absolute 


Alfred B: Heilbrun, Jr. 


intercorrelation between all scales providing 
significant or near-significant differences for 
males and for females was determined. The 
median absolute intercorrelation for all com- 
binations of the 10 ACL Need scales for males 
was .47 and this median value for females was 
.24. Although these figures suggest a tendency 
towards scale clustering based upon some 
common source(s) of variation, there appears 
to be sufficient independence among the dis- 
criminating scales to consider the various be- 
tween-group differences in terms of distin- 
guishable categories of manifest behavior. 


DISCUSSION 


It seems clear that male college Ss’ self- 
descriptions on the ACL, when scored by 
Need scales, allow for a discrimination be- 
tween groups differing in level of personal 
adjustment. Not only do adjusted and mal- 
adjusted students show different patterns of 
needs on the test, but the differences corre- 
spond remarkedly well to a modal pattern of 
adjustment as judged by psychologists. Based 
upon both the psychologists’ judgments and 
the empirically obtained differences on the 
ACL, one might expect the personally mal- 
adjusted college male to show the following 
personality characteristics relative to an ad- 
justed college male: lower need for achieve- 
ment; less orderly; less likely to seek out 
friends; more desirous of being cared for; 
less dominant in his personal relationships; 
more likely to feel inferior, timid, and inade- 
quate in relating to others; less able to see 
something through once it is started; more 
aggressive; and perhaps less driven hetero- 
sexually. 

There was less agreement between the psy- 
chologists’ judgments and the results of direct 
test comparison between adjusted and mal- 
adjusted Ss in the case of females. However, 
consistencies between judgment and empirical 
findings would lead one to expect the follow- 
ing personality characteristics of maladjusted 
college females relative to adjusted: less or- 
derly; less conforming or conventional and 
more independent; more likely to feel inferior, 
timid, and inadequate in relating to others; 
less willing or able to give in a concrete or 
emotional way to others; and less able to see 
something through once it is started. Two 
further characteristics suggested by test dif- 
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ferences but not by the ratings would be a 
greater need to be taken care of and a greater 
need for novelty or change in their environ- 
ment on the part of maladjusted college 
females. 

It is not clear whether the lesser congruence 
between differences and psychologists’ 
judgments with the female group than with 
the male is primarily a function of the adjust- 
ment criteria or reflects a characteristic of the 
female maladjusted group relative to the 
male. It seems quite possible that the degree 
of maladjustment was in fact less with the 
females than with the males and that this 
tended to suppress the Need scale differences 
between the female adjusted and maladjusted 
groups. It can be noted in Table 2 that the 
scale differences were in the apprapriate di- 


test 


rection in all seven cases where the judges 
rated a need as adjustive or maladjustive but 
the test reflected no reliable differences. One 
basis for expecting a difference in actual de- 
gree of maladjustment when maladjusted 
groups were defined as they were in this study 
would be the sociocultural masculinity ster- 
eotype which might make the male more re- 
luctant to admit his inability to cope with his 
personal problems by seeking help at a coun- 
seling service than would be the case with the 
female. The logical effect of this reluctance 
would be to generate a male counseling group 
with more serious personal problems than a 
female counseling group. Support for the con- 
tention that males are more reluctant to seek 
help for personal problems is found in the 
fact that for a recent 15-month period only 
slightly more than half (52%) of the personal 
adjustment clients at the University Counsel- 
ing Service were males, while for the same 
period over two-thirds (689%) of the student 
population at the State University of lowa 
were males. 

Although the results of this investigation 
were interpreted in terms of personality dif- 
ferences between adjusted and 
maladjusted college students, serious consid- 


personally 


eration must be given to the alternative possi- 
bility that the results can be accounted for in 
terms of the social desirability of the behav- 
under consideration. In line with this 
hypothesis, the ACL test differences would be 
seen as determined by the lesser tendency of 
maladjusted Ss to portray themselves in a 


iors 


LABLE 3 
CORRELATIONS BETWEEN THE ACL Neep Scat 
EDWARDS PERSONAL PREFERENCE SCHEDULE 
VARIABLES AND MMPI K ScaLe 


ACL vs ACL vs EPPS 
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socially desirable light relative to normal col- 


lege students and the judges’ ratings of 
justment would be seen as based upon 


ad- 
the 
desirability of the behaviors rated, not their 
adjustive value. 

With regard to the relationship between 
social desirability of test responses and their 
probability of endorsement, there is consider- 
able empirical evidence that the former ac- 
counts for a considerable amount of the per- 
formance variation on objective personality 
tests (Edwards, 1954; Hanley, 1956; Heil- 
brun, 1958; Rosen, 1956). However, it was 
found that the ACL Need scales are relatively 
uncorrelated with the K scale of the MMPI, 
a generally accepted measure of social desir- 
ability set. Table 3 presents the correlations 
between K scale and ACL scores for separate 
groups of 100 male college students and 100 
female college students (the majority of each 
group being Ss in the present study) as well 
as the correlations between K and these per- 
sonality variables reported for the Edwards 
Personal Preference Schedule (1954) which 
includes a forced-choice format to minimize 


social desirability as a source of response 
variance. It can be seen that for only 2 of the 


30 correlations between K and the Need 
scales (K vs. Succorance for females, K vs. 
Aggression for males) does the K score ac- 
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count for more than about 5% of the response 
variance on the ACL scales and even in these 
two cases the common variance is small (16 
17%). Since the ACL Need scales appear to 
be relatively unaffected by the social desir- 
ability response set of the S, the hypothesis 
that the obtained test differences between ad- 
justed and maladjusted Ss in this study can 
be attributed to differences in this response 
set is not supported. 

In a recent study by Wiener, Blumberg, 
Segman, and Cooper (1959), the investigators 
report a correlation of .88 between psychol- 
ogists’ adjustment ratings and social desir- 
ability ratings of Q sort items. Similarly, the 
rank order correlations between the final ad- 
justment values for the 15 ACL needs eval- 
uated in this study and students’ ratings of 
the desirability of these needs (data from 
Heilbrun, 1958) were .86 for females and .84 
for males. Wiener et al. hypothesize that their 
high correlation reflects a tendency for judges, 
in making judgments about the adjustment 
value of behaviors in the absence of under- 
lying genotypical factors, to “resort to a con- 
cept of social acceptability of specific behav- 
iors.” What would seem to be an equally 


plausible hypothesis to account for these high 
correlations would be that adjustive behaviors 
and socially desirable behaviors are, in fact, 
highly overlapping behavioral classes. It seems 
clear, however, that this issue cannot be set- 
tled in any a priori fashion and further re- 
search is necessary. 


SUMMARY 


The present study investigated the person- 
ality differences between adjusted and mal- 
adjusted college students using the Need 
scales scored on the Gough Adjective Check 
List and the judgments of psychologists as 
criteria. Groups of male and female Ss who 
had sought help at a college counseling service 
fcr personal problems were defined as mal- 
adjusted and their Need scale scores were 
compared with samples of Ss from the general 
college population which were defined as the 
adjusted groups. The pooled ratings of 24 
psychologists were used to determine the ad- 
justment values associated with the behaviors 
characterizing each need. 

Results of the male adjusted vs. malad- 
justed group comparisons on the ACL showed 
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that these groups differed reliably on nine 

scales with very close agreement between 

these empirical findings and the psychologists’ 
judgments. For the female comparison, there 
was agreement between test findings and 
judges’ ratings on five of the needs. 

Consideration was given to the greater con- 
gruence of test-determined differences between 
adjusted and maladjusted male Ss and the 
psychologists’ adjustment ratings than was 
the case when females were considered as well 
as the possibility that the present results 
might be accounted for by a social desirability 
factor. 
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A SIMPLIFICATION OF HAY’S METHOD OF RE 
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Recently, Hay (1958) presented a simpli- 
fied method of recording paired-comparison 
judgments. His method has the distinct ad- 
vantage of presenting all comparisons on the 
same sheet of paper rather than on numerous 
paired stimulus cards; however, his method is 
laborious, especially for large numbers of 
stimuli. The present paper presents another 
simplification of this method, requiring the 
same number of notations but reducing by one 
half the number of comparisons necessary. 

Figure 1 presents the method proposed by 
Hay, showing the tally marks resulting from 
the comparison of every stimulus with every 
other stimulus. Reading horizontally, the 
stimulus which best suits the criterion (social 
status in this case) is indicated by a tally 
mark in the row to the right of that stimulus 
in the column under the stimulus with which 
it was compared. Unfortunately, in a sym- 
metrical chart such as this, two comparisons 
are required of every stimulus with every 
other stimulus. A total score for each stimulus 
is derived by counting the number of tally 
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marks to the right of each stimulus. To ar- 
rive at stimulus scores using such a chart 
N(N — 1) comparisons must have been made. 

Figure 2 presents the method proposed 
here. Only the area of the chart above the 
diagonal is retained and for every comparison 
a notation is made. The notations in this case 
are the first letters of the job titles compared, 
but in other situations one might use the 
initials of the persons receiving the positive 
discriminations, a code number previously 
placed in parentheses behind each stimulus to 
be compared (especially in the case of large 
numbers of comparisons or redundant ini- 
tials), etc. For example, in the comparison of 
Skilled Tradesman with Attorney an A is 
entered in the comparison column and row 
indicative of the fact that Attorney is higher 
in social status than Skilled Tradesman. After 
recording the discriminations, the total score 
for each stimulus is ascertained by counting 
the stimulus’ “initials” down the chart in a 
given stimulus column to the diagonal and 
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then across the chart in the corresponding 


stimulus row. The present method requires 
only N(N —1)/2 comparisons and permits 
the design of a considerably abbreviated com- 
parison chart. 

Comparing the two methods it appears that 
the method presented here is probably not 
much shorter of completion for small numbers 
of comparisons than is the Hay method, how- 
ever, for large numbers of comparisons the 
time and effort saved and the number of com- 
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parisons obviated by the proposed method 
could amount to appreciable savings. As Hay 
suggests, a quick check on the number of com- 
parisons that should have been made is the 
sum of the discriminations in favor of each 
stimulus. This sum should equal N(N — 1) /2. 
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In the absence of more objective evidence 
of job performance, ratings of an individual 
by supervisors or superiors are a common 
criterion. The armed forces have made exten- 
sive and consistent use of this type of meas- 
ure in evaluating the effectiveness of their 
officers. These Officer Effectiveness Ratings 
(OERs) are the basis for many personnel 
actions and have an important effect on an 
officer’s career. 

The present study was undertaken to deter- 
mine the predictability of OERs by informa- 
tion obtained while the subjects (Ss) were in 
primary pilot training. The investigation was 
part of a larger research program designed to 
develop predictors of adaptability to the Air 
Force (Sells, 1951, 1955; Sells & Barry, 
1953). Adaptability as used in this research 
refers to temperamental and motivational 
characteristics, such as emotional disturbance 
or program-oriented motivation deficit, which 
contribute to a man’s success or failure in 
training and his continued adjustment to 
military flying (Trites, Kubala, & Cobb, 
1959). 

Based upon an hypothesis first stated by 
Sells (1951) and subsequently confirmed by 
Kubala (1958) with a type of posttraining 
criterion measure other than OERs, it was 
predicted that  training-level adaptability 
measures would be more highly related to 
the OERs than would variables reflecting 
primarily aptitude or ability. factors. If the 
prediction were not supported, the usefulness 
of the training-level adaptability measures as 
preliminary criteria for use in research would 
be in doubt. In the latter instance, even 
though the variables might still be considered 


1 This study was begun at the School of Aviation 
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completed at the Personnel Laboratory, WADD, Det 
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measures of adaptability in some sense, as far 
as the Air Force is concerned they would not 
be measures of adaptability predictive of 
OERs. 

Since some of the training-level adaptabil- 
ity measures were derived from assessments 
made by psychologists at the time the Ss 
graduated from primary pilot training, a sec- 
ondary purpose of the study was determina- 
tion of the validity of clinical evaluations as 
compared with assessments made by peers, 
superiors, other experts, and objective in- 
formation. The opportunity to investigate the 
validity of these various types of data seems 
unique with respect to the size of the samples 
involved and the time periods covered. 

The development of the preceding taxonomy 
for the description of criterion data has been 
presented in other reports (Trites et al., 
1959; Trites & Sells, 1957). For the purposes 
of this report, all variables were categorized 
as being assessments by experts, peers, supe- 
riors, or as objective information. 


PROCEDURE 


The design of the study involved the development 
of a regression equation for the prediction of OERs 
in a sample of several hundred cases and the valida- 
tion of the equation on a completely independent 
sample. The variables, samples, and statistical meth- 
odology are described separately. 

Samples. The Ss forming the experimental sample 
completed primary pilot training? at Randolph Air 
Force Base, Texas. They subsequently completed 
basic pilot training at other Air Force bases and re- 
ceived their pilot ratings during 1950 through 1952. 
Data were available for’ subgroups of varying size 
obtained from a pool of 666 Ss who went through 
training as aviation cadets. 

A second group of aviation cadet Ss completed 
primary pilot training at Greenville Air Force Base, 
Mississippi. They subsequently completed basic pilot 

2 At the time data used in this research were being 
collected, pilot training was divided into two periods, 
each having a duration of approximately six months. 
The first period was called primary training, the 
second period was referred to as either basic or ad- 
vanced training. 
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training at other Air Force bases and received their 
pilot ratings during 1952 and 1953. When reference is 
made to calculations based on all data available for 
aviation cadet Ss trained at this base, the sample will 
be referred to as the total Greenville sample. From 
this larger group, only 69 Ss had all of the data for 
the variables used in the regression equation com- 
puted on the Randolph experimental sample. These 
69 Ss will be referred to as the Greenville validation 
sample for the regression equation. 

The Ss in all samples were between the ages of 19 
and 26, all had passed a rigorous physical examina- 
tion, and many had met aptitude requirements based 
upon various Air Force selection test batteries. 

Although large groups of officers were in training 
at the same time as the aviation cadets, much less 
training level information was available for them. 
Consequently, officer Ss were used in this study only 
in conjunction with the synthesis of the OER crite- 
rion. 

Variables. Kubala (1958) has described the factor 
analysis of a number of training-level criteria. The 
major factors which he identified were essentially 
replicated in a subsequent study (Trites et al., 1959), 
although the correspondence of the variables in the 
two analyses was not perfect. 

Since there were almost no OER data available for 
the sample used in the replication, three global in- 
dices of adaptability and 10 variables representing 
the common factors were selected from the sample 
used in Kubala’s (1958) analysis. These variables are: 
Objective: 

1.3 Pilot Stanine (Pil Sta)—Estimate of S’s flying 
aptitude derived from a battery of paper-and-pencil 
and psychomotor tests. Scores are on a nine-point 
normalized scale. High score = high aptitude 

2.3 Officer Quality Stanine (OQ Sta)—Estimate of 
§’s officer-like qualities derived from same battery of 
tests as Pil Sta. Sometimes interpreted as measure of 
intelligence. Scores are on a nine-point normalized 
scale. High score = high aptitude. 

3. Demerit Stanine (Dem Sta)—Represents total 
number of demerits accrued during primary pilot 
training. Scores are on a normalized nine-point scale. 
For experimental sample, high score = many demer 
its; for Greenville group, high score = few demerits. 

4.3 Flying Grade (Fly Gr)—Normalized nine-point 
score derived from final numerical flying grades as- 
signed in primary pilot training. High score = 
grades. 


good 


Expert Ratings: 

1. Adjustment Rating (AR)—Derived from global 
evaluations of motivation and personality made by 
psychologists at the completion of primary pilot 
training. Evaluations were based on interview and 
psychological tests. Scores are on a nine-point scale, 
ranging from 1 (inadequate motivation and person- 
ality) to 9 (excellent motivation and personality). 


8 These variables are considered to be ability or 
aptitude measures. All others are considered to be 
measures of adaptability. 
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2. Actuarial Adjustment Rating (AAR)—Based 
upon linear sum of specific trait ratings made by 
psychologists at the end of primary pilot training. 
High score = good adjustment. 

3. Adjustment Group (AG)—Rating made by a 
psychologist using all information available on Ss at 
the end of pilot training. 

4. Personal Adjustment (Pers Adj)—Represents a 
factor found in analysis (Kubala, 1958) of training 
level criterion data. Derived from sum of ratings by 
psychologist on two variables. Scores range from 
0 to 6. High score = good adjustment. 

5. Military Aptitude Rating: Tactical Department 
(MAR Tac)—Military Aptitude Ratings were eval 
uations of an S made on a 25-item checklist, each 
item having 5 points. Ratings were summed over 
raters for each S and converted to a normalized nine- 
point scale. High score = high aptitude ratings. Only 
MARs from instructors in the tactical department 
were considered to be ratings by experts, since only 
the tactical department personnel were trained to 
give such ratings and were in a ‘position to observe 
appropriate behavior. 


Superior Ratings: 


1. Military Aptitude Rating: Upperclass (MAR 
Up)-—(See description of MAR Tac) Upperclassmen 
were men in training in the class immediately ahead 
of the Ss whom they were rating 

2. Military Aptitude Rating: Flying Department 
(MAR Fly)—(See description of MAR Tac) Ratings 
by the flying instructors of the Ss. 

3. Educability (Educ)—Represents a factor found 
in analysis (Kubala, 1958) of training level criterion 
data. In experimental sample, derived from a Mil- 
itary Aptitude Rating by academic department in- 
structors and an overall rating of an S’s adaptability 
by weather department instructors. In Greenville 
group, derived from the overall adaptability rating 
by weather department instructors and the S’s final 
academic average. High score = high educability 
Peer Ratings: 

1. Military Aptitude Rating: Classmates (MAR 


Cl)—(See description of MAR Tac) Ratings by men 
in the same class as the Ss whom they were rating. 


Three of the 13 variables, Pers Adj, Educ, and 
AAR are arithmetic composites of other variables 
The first is the sum of ratings on a 3-point scale by 
a psychologist of each S’s Stress Reactions and Tol- 
erance and Interpersonal Relations. 

For the Randolph experimental sample, the Educ 
variable is a weighted composite of ratings by an 
instructor in the weather department of the primary 
flying school (4-point scale) and a Military Aptitude 
Rating by the academic department of the school 
(normalized 9-point scale). For the total Greenville 
group and validation sample, the Educ variable is a 
weighted sum of the weather department instructor’s 
rating and the S’s final academic average (normalized 
9-point scale) in the primary flying school. 

The AAR is the normalized sum (X = 50, ¢ = 10) 
of a number of specific trait ratings made by psy- 
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chologists at the time the Ss graduated from primary 
pilot training. Each S was seen by one of nine psy- 
chologists * who administered a battery of psycho- 
logical tests and interviewed him. On the basis of this 
information, the psychologist then made 2 overall 
ratings of motivation and personality plus 20 specific 
trait ratings. (Two of the latter were the ratings 
described as forming the Pers Adj variable.) In com- 
puting the AAR for each psychologist were 
handled separately in order to eliminate rater differ- 
ences. Kubala (1956) reported the reliability of the 
AAR to be 85.5 

The AR was formed from the bivariate distribu- 
tion of the overall ratings of motivation and per- 
sonality just mentioned. Another global measure, 
AG, represents an overall rating by a psychologist 
using all information available on an S, including the 
AR and the AAR. The Ss were reted on a 5-point 
scale ranging from 1 (very wel! i to training 
with a good prognosis for the fu to 5 (very 
poorly adjusted to training with a prognosis 
for the future). 

In 1954 and 1955, OER data were 
various Air Force records centers for the Ss. Two 
different OER forms® were in use during the time 
period covered by this investigation. To derive an 
OER score for each S, distributions were made of 
each OER type. From the distributions, the OERs 
for each S were converted to standard scores. The 
standard were summed, and the 
averages converted to a stanine scale. The stanine 
score is the OER criterion variable 

An estimate of the reliability of the OER criterios 
was obtained from 410 Ss having four or more 
OERs. By a randomization process the OERs for 
each S were divided into two subgroups. With the 
method used to compute the OER criterion 
score, stanines were computed for each group sepa- 
Correlating the pairs of stanines for each S 
yielded a product-moment coefficient of 54. When 
this was corrected to reflect an average of four OERs 
per S, the estimated reliability coefficient was .70 


data 


obtained from 


scores averaged, 


same 


rately 


Only two of the psychologists, both civilians, had 
a PhD. Of the others, all had some graduate training 
in psychology, most had master’s degrees, all were 
Air Force officers, and two were rated pilots 

5 Unpublished data from the research files of the 
former Department of Medical Psychology, School of 
Aviation Medicine, USAF, now in the custody of the 


Personnel Laboratory, WADD, Det. 1, Lackland 
AFB, Texas, revealed that the AARs used for the 
correlation were obtained from ratings of 44 Ss by 
two psychologists. Although the same test data were 
available to the psychologists, each interviewed the 
S independently, or listened to a tape recording of 
the other’s interview, and then made his ratings. This 
type of reliability study is biased in favor of high 
coefficients, but, at the time, was the only 
method of obtaining a reliability estimate 

6AF Form'77, dated 15 March 1949; and AF 
Form 77, dated 1 December 1951 or 15 November 
1953. 


feasible 


This is somewhat higher than previously reported 
reliabilities (Tupes, 1957). 

Statistical Methodology. From the data of the 
Randolph experimental samples an intercorrelation 
matrix was computed for the 13 training-level (pre- 
dictor) variables and the OER criterion.? The num- 
ber of Ss having the appropriate scores varied for 
each correlation, but in no case was less than 200. 
Ten of the 13 predictors (AG, Pers Adj, and Educ 
are excluded) were developed on groups containing 
failures in flight training who never became officers 
Consequently the variances of these 10 were re- 
stricted to some extent. No attempt was made to 
correct for this in the analysis 

The regression equation and multiple correlation 
were computed and the predictor variables selected 
by DuBois’ (1957) method. The equation was then 
applied with integral weights to the data of the 
validation sample and the cross-validity coefficient 
computed 

Finally, product-moment correlations were com- 
puted between the 13 training-level criterion meas- 
ures and the OER criterion with Ss in the total 
Greenville sample who had the appropriate data 


RESULTS 


The regression analysis for the experimen- 
tal sample intercorrelation matrix was con- 
tinued until three variables (MAR Tac, Pers 
4dj, and Educ, in order of extraction), yield- 
ing a multiple correlation of .37, had been 
identified. Tests of the contributions of other 
variables to the multiple (McNemar, 1955) 
indicated that none added increments which 
were significant at less than the .05 level. 

The final regression equation in standard 
form was: 


Predicted OER = (.233) MAR Tac 
+(.199) Pers Adj + (.068) Educ 


In deviation score form, the integral weights 
for the respective terms of the equation were: 
os) 4, 

Application of the deviation score form of 
the regression equation to the Greenville val- 
idation sample of 69 Ss produced a distribu- 
™ Table A containing the intercorrelation matrix 
has been deposited with the American Documenta- 
tion Institute. Order Document No. 6403 from ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress; Washington 25, D. C., 
remitting in advance $1.25 for microfilm or $1.25 
for photocopies. Make checks payable to: Chief, 
Photoduplication Service, Library of Congress. 

8See Trites and Sells (1957) for the unrestricted 
variances of 9 of the 10 predictors 
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TABLE 1 


PRODUCT—MOMENT CORRELATION, AND RELATED 
STATISTICS, BETWEEN TRAINING—LEVEL 
MEASURES AND THE OER CRITERION 


Training-Level ; 
Criteria N Xi 


AG* 336 2.68 
AAR 303 
AR : 310 7.25 
MAR Cl ‘ 296 5.04 
MAR Up 262 4.90 
MAR Tac ‘ 296 5.08 
MAR Fly ‘ 296 5.04 
Pil Sta 333 6.29 
OQ Sta 320 5.80 


56.64 


Dem Sta * 355 4.92 
Pers Adj , * 300 3.21 
Educ .20** 226 13.60 
Fly Gr 223 6.09 


» A low score indicates good adjustment. 

+ A low score indicates few demerits. 

* Significant at less than the .05 level (two-tailed test). 
** Significant at less than the .01 level (two-tailed te 


tion of predicted OERs which correlated .36 
with the actual OER criterion. 

Product-moment correlations between the 
13 training-level variables and the OER crite- 
rion are presented for the Randolph experi- 
mental sample and the total Greenville sample 
in Tables 1 and 2, respectively. With the 
exception of the Pers Adj and Educ variables, 
all correlations significant at the .05 level, or 
less, in the experimental sample, are signifi- 
cant at the .10 level, or less, in the Greenville 
group. It should be noted that the Pil Sta, OQ 
Sta, and Fly Gr variables did not reach sta- 
tistical significance in either group. 


DiIscUSSION AND CONCLUSIONS 


The results indicate that job performance 
criteria, as measured by ratings such as OERs, 
can be predicted by measures collected while 
a man is still in training for the job. As was 
hypothesized, the training-level adaptability 
criteria are more highly related to post- 
training effectiveness than ability or aptitude 
criteria.® 

The correlations between the OER and 
adaptability criteria are a function of many 
determinants. From the data of this study, 


*It should not be concluded that all measures of 
aptitude or ability are unrelated to OERs. Measures 
different from the ones used in the present study 
might have had some predictive validity. 


two of the determinants can be described in a 
general way. The first is related to the degree 
of congruence between the training and job 
situations. Both seem to require ability and 
adaptability; but, as a man passes from train- 
ing to the job, the latter becomes increasingly 
important and because of the congruence of 
the requirements of the two situations, raters 
who were very familiar with both would have 
a greater probability of making valid assess- 
ments during training. Conversely, as training 
and subsequent jobs become more dissimilar, 
prediction should become less effective. 

The second general determinant is the one 
pointed out by Williams and Leavitt (1947) 
in their discussion of the significant correla- 
tions between a combat performance criterion 
and measures collected prior to combat: “A 
common factor in all positive correlations 
appears to be personal judgments” (p. 289). 
This statement holds true for the OER crite- 
rion and the variables of the present study 
which are significantly related to it. 

The correlations in both samples between 
the OERs and the AG measure of adaptabil- 
ity indicate that clinicians can make mean- 
ingful assessments when they consider all the 
information available on an S. On the other 
hand, the AR and AAR validity coefficients 
are significant at only the .10 level in the total 
Greenville sample. However, in terms of mag- 


rABLE 2 
PrRoDUCT—-MOMENT CORRELATIONS, AND. RELATED 
STATISTICS, BETWEEN TRAINING-LEVEI 
MEASURES AND THE OER CRITERION 
FOR THE TOTAL GREENVILLE Group 


] 


Training-Le 


05 level (two-t 
O1 level (two-t 





Adaptability Measures as Predictors of Performance Ratings 


nitude they are equal to, or greater than, the 
corresponding correlations in the Randolph 
sample. In toto these findings support the 
validity of clinical psychologists as predictors. 

In spite of the validity of the psychologists’ 
evaluations, the practical utility of global 
adaptability assessments is open to question. 
Since MARs are much easier and less expen- 
sive to obtain, they may be sufficient for 
predictive purposes. The correlations between 
OERs and the Educ variable support the 
argument. 

In the Randolph experimental sample, Educ 
made a unique, significant contribution to the 
multiple correlation, but in the total Green- 
ville sample its validity was greatly reduced. 
The difference between the samples can be 
related most parsimoniously to the different 
methods, previously described, of obtaining 
the Educ variable. Considering the validity of 
the various MARs and the lack of validity of 
the aptitude and ability measures in both 
groups, it be concluded that the valid 
variance of Educ in the experimental sample 
is primarily attributable to the presence of a 
MAR by the academic department instruc- 
tors, and that the Academic Average substi- 
tuted for the MAR Acad in the Greenville 
group has no predictive validity. 


may 


SUMMARY 


A regression equation based upon data col- 
lected during primary pilot training yielded a 
multiple correlation of .37 against a criterion 


of Air Force Officer Effectiveness Ratings 
(OERs) collected subsequent to training. Ap- 
plying the equation to data from an inde- 
pendent sample yielded a cross-validity co- 
efficient of .36. The nature of the variables 
entering into the equation and consideration 
of the first-order correlations between OERs 
and the predictors in both samples, led to the 
conclusions that: (a) measures of adaptabil- 


ity to training and Air Force life are more 
highly related to later officer performance 
than are measures of aptitude or ability; (5) 
assessments of a man’s functioning involving 
personal judgments of peers, superiors, and 
experts are predictive of later performance as 
an Air Force officer. Insofar as OERs are 
similar to procedures used in evaluating job 
performance in other situations, the results of 
the study may be generalized. 


REFERENCES 

Dv Bors, P. H. Multivariate correlational analysi 
New York: Harper, 1957. 

Kupara, A. L. A dimensional analysis of training 
level criteria of adaptability and performance, and 
their relationship to future success. Unpublished 
doctoral dissertation, Univer. Texas, 1956 

Kupara, A. L. Adaptability screening of flying per 
sonnel: Preliminary analysis and 
criteria of adaptability to military flying 
Sch. Aviat. Med. Rep., 1958, No. 58-141 

McNemar, Q. Psychological statistics. (2nd ed.) New 
York: Wiley, 1955. 

Setts, S. B 
selection of 


validation of 


USAF 


A research program on the psychiatric 

flying personnel: I. Methodological 
introduction and experimental design. USAF Sch 
Aviat. Med. Rep., 1951, No. 1. (USAF Proj. No 
21-37-0002) 

Setts, S. B. Development of a personality test bat- 
tery for psychiatric screening of flying personnel 
J aviat Med., 1955, 26, 35-45. 

Setts, S. B., & Barry, J. R. A research program to 
develop psychiatric selection of flying personnel 
J. aviat. Med., 1953, 24, 29-47 

Trites, D. K., Kupara, A. L., & Coss, B. B. Devel 
opment and validation of adaptability criteria. J 
appl. Psychol., 1959, 43, 

Trites, D. K., & Setrs, S. B. Combat performance 
Measurement and prediction. J. appl. Psychol., 
1957, 41, 121-130. 

Tupes, E. C. Psychometric characteristics of officer 
effectiveness reports of OCS graduates. USAF Per- 
sonnel Train. Res. Cent. res. Rep., 1957, AFPTRC- 
TN-57-20 

WittiaMs, S. B., & Leavitt, H. J. Group opinion as 
a predictor of military leadership. J. consult. Psy- 
chol., 1947, 11, 283-291 


25-30 


(Received January 5, 1960 





Journal of Applied 


Psychology 
1960, Vol. 44, No. 5, 35 $7 


54-35 
ANALYSIS OF 


FRANCIS D. HARDING 


A JOB EVALUATION SYSTEM’ 


, JOSEPH M. MADDEN, ann KENNETH COLSON 2 


Wright Air Development Division, United States Air Force 


In past years, job evaluation procedures 
have been studied by many psychologists. 
Viteles (1941) probably stimulated much of 
the activity with his comment “that beneath 
the superficial orderliness of job evaluation 
techniques there is much that smacks of 
chaos” (p. 165). He listed the following 
problems as requiring attention: the reduction 
of the number of factors to a minimum con- 
sistent with adequate differentiation of jobs, 
better definition of the factors used, deter- 
mination of the number of points to be used 
in the scales, and the application of scientific 
concepts of measurement. 

Several studies (Ash, 1948; Grant, 1951; 
Howard & Schultz, 1952; Lawshe & Satter, 
1944) have dealt with factor analyses of job 
evaluation plans. The general finding has been 
that instead of the ten or more factors usually 
used, only three or four factors were needed 
to obtain the same results. Depending upon 
the plan being analyzed and the type of jobs 
studied, the basic factors were concerned with 
skill demands, responsibility, working condi- 
tions, and job hazards. Since these studies 
contained no outside reference variables to 
help in the interpretation of the analyses, the 
results are pertinent more to the overlap of 
factors than the determination of 
factors to be used in job evaluation. 

Other studies (Chesler, 1948; Lawshe & 
Farbo, 1949) have attempted to measure the 
reliability or consistency of evaluations by 
correlating ratings made at separate times or 
by agreement between independent raters. 
The consensus of these studies was that 
trained analysts can reliably or consistently 
apply the evaluation plans. 


to basic 


The present study is an analysis of certain 
aspects of the job evaluation system used by 
the United States Air Force. Special attention 


1The research reported in this paper was spon- 
sored by Personne! Laboratory, Wright Air Develop- 


ment Division, under ARDC Project 7734, Task 


17015. 


2 Now at Nortronics, Anaheim, California 
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has been given to three questions: How much 
difference is there between consensus evalua- 
tions which are based upon discussions among 
raters and the averages of independent rat- 
ings? What is the interrelationship between 
the separate factors and how much do they 
contribute to the total evaluation of a job? 
What is the reliability or consistency of the 
evaluations? 


PROCEDURE 


Evaluaticn pri The were in 
dividuals who were either supervisors or craftsmen in 
the jobs being evaluated. For most of these individ- 
uals it was the first time that they had used the job 
evaluation procedures. Each individual rated only 
the particular job in which he had technical com- 
petence. Evaluati 
descriptions but rather on job requirements as per- 
ceived by these technical experts. The job evaluation 
plan used was a point system using 10 factors, each 
which divided into six rating The 
factors used and the weights assigned to each factor 
were: 1. Knowledge (60); 2. Physical Skill (2( 
3. Adaptability and Resourcefulness (30) ; 4. Respon- 
sibility for Money and Materials (10); 5. Responsi 
bility for Safety of Others (10) ; 6. Responsibility for 
Directing Others (30); 7. Physical Effort (10); 8 
Attention (20); 9. Job Conditions (10); and 1¢ 
Military and Combat (20). Each rater independently 
evaluated the job on each factor and then conferred 
with one other rater to arrive at a consensus evalua- 
tion for each factor. A total score for the job was 
computed by multiplying the rating on each factor 
by the weight assigned to the factor and summing 
for the 10 factors. In this manner, individual 
evaluations and 193 two-man consensus evaluations 
were obtained. 

Jobs evaluated. The sample of 50 Air Force Spe- 
cialties included in the study were chosen to be 
representative of all the career fields in the Air Force 
and the relative numbers of airmen in each career 
field. Within each specialty a particular duty position 
or job was selected to be evaluated 


cedures raters used 


yns were based not on written 


of was levels 


)): 
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RESULTS AND DISCUSSION 


Relationships between individual and con- 
sensus scores. An important step in the opera- 
tion of many job evaluation plans occurs after 
the individual judges have made their evalua- 
tions, when they confer to determine a con- 
sensus on any factors about which they dis- 
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agree. It is taken for granted that certain 
benefits are derived from holding conferences. 
Supposedly, more reliable evaluations are ob- 
tained or greater acceptance of the evalua- 
tions is gained. However, since the arranging 
for and conducting of such conferences can 
become cumbersome and costly, it seems 
worthwhile to compare the two-man consensus 
evaluations that result from conferences with 
evaluations that would result from averaging 
the judges’ independent ratings. 
such as the following may be asked: “Does 
the consensus of the judges tend to be higher 
or lower than an average of their evalua- 
tions?” “How much of the variance in the 
consensus evaluations can be accounted for by 
averages of individual ratings?” “How much 
difference is there between averages and con- 
sensus scores?” The answers to these ques- 
tions will permit a tentative appraisal of the 
usefulness of such conferences. 


Questions 


A frequency distribution of the differences 
between the consensus scores and the averages 
of the two men’s individual evaluations was 
computed for each factor. On all factors very 
close agreement was found between the aver- 
ages and the consensus evaluations. Thus it 
appears that there is no consistent tendency 
on the part of the conferees to compromise 
their evaluations in a higher or lower direction. 

A study of the total scores computed by 
combining the levels assigned each factor by 
a conference showed similar results. Because 
of the similarity between average and con- 


rABLE 1 


NTERCORRELATIONS OF FACTORS AND ToT 
I LAT F | TOR ND J 


Note Intercorrelations 
individual ratings. Deci 


Evaluation System 


TABLE 2 
PROPORTION OF VARIANCE IN A FAC 
Be PREDICTED FROM OTHER 


roR WuHicH CAN 
FACTORS 


Factors Contributing Most 
to Prediction* 


al Skill 
Adar 
Reso 
Responsibility for 
Money & Materials 
Responsibility for 


tability & 


ircefulness 


Safety of Other 
Responsibility for 
Directing Others 
Physical Effort 
Attention 

Job Condit 


between 
factors presented in Table 1 are those ob- 
tained by correlating average scores on each 
factor. 


sensus scores, the intercorrelations 


The question as to how much variance in 
the conference scores based upon consensus 
can be accounted for by the average scores 
was answered by regression analysis. It was 
found that 95% of the variance in the crite- 
rion (total scores based upon consensus of 
conferees) was predictable from the averages 
of the raters’ evaluations. There would appear 
to be fairly high agreement between consensus 


total scores and scores computed from the 
average of individual evaluations. 
Analysis of interrelationships within the 


system. The factor intercorrelations 
1) and the contribution of factors to the 
prediction of other factors (Table 2) are 
quite useful in interpreting the relationships 
between factors. In this study three factors, 
Knowledge, Adaptability and Resourcefulness, 
and Attention, appear to form one group; 
three others, Responsibility for Safety of 
Others, Physical Effort, and Job Conditions, 
also seem to be related. An iterative procedure 
(Greenberger & Ward, 1956) was used to per- 
form several regression analyses in which each 
factor, in turn, was freated as the criterion 
while the remaining factors were the pre- 
dictors. In this way, it is possible to determine 
the percentage of variance in each factor 


(Table 
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TABLE 3 


ESTIMATES OF RELIABILITY BASED ON INTERCORRELATIONS AND COMPONENTS OF VARIANCE 


Single Rater 


Average of 5 Raters 
Component of 
Variance 


Component of 


Factor Intercorrelation Variance 


. Knowledge 
Physical Skills 
Adaptability and 
Resourcefulness 
Responsibility for 
Money and Materials 
Responsibility for 
Safety of Others 
Responsibility for 
Directing Others 
Physical Effort 
Attention 

Job Conditions 


Military and Combat 


lotal Score 


which is predictable from the other nine fac- 
tors. The small amount of predictable var- 
iance found in this study may be interpreted 
as evidence that each factor in the system 
contains a considerable amount of unique 
variance. Therefore, one may be inclined to 
question the advisability of previous efforts 
to reduce the number of factors used in job 
evaluation systems. However, it is obvious 
that no generalizations are warranted unless 
account is taken of the reliability as well as 
the uniqueness of the variance of each factor. 
Unfortunately, as will be shown later, this is 
difficult to do, for different techniques of 
measuring reliability yield widely divergent 
estimates of reliability. 

Contribution of individual factors to total 
composite score. The use of multiple regres- 
sion analysis makes it possible to determine 
how much edch predictor variable adds to the 
prediction of the criterion, i.e., its independ- 
ent or unique contribution. This is accom- 
plished by first computing a regression anal- 
ysis in which all the predictor variables are 
allowed to operate. In each successive anal- 
ysis, one factor is left out of the group of 
predictors used. In each analysis a different 


.157 945 480 


.294 869 675 
Be 8 509 


.629 


factor is removed until all possible combina- 
tions of N-1 predictors have been used as 
predictors. The differences between the pro- 
portion of variance accounted for when the 
full array of predictors is used and when the 
reduced groups of N-1 predictors are used 
reflect the independent contribution of the 
predictors which have been removed. The 
significance of the contribution can be tested 
by computing an F ratio of independent esti- 
mates of variances: the numerator consists of 
the sum of measurement error plus the con- 
tribution of the variable being tested; the 
denominator consists only of the variance due 
to measurement error. 

The above procedures were carried out us- 
ing data presented in Table 1. Six of the 
factors were found to make contributions 
which are significant at the .01 level. These 
are: Knowledge, Physical Skills, Adaptability 
and Resourcefulness, Responsibility for Di- 
recting Others, Attention, and Military and 
Combat. 

Reliability. Yhe reliability of the ratings 
obtained in the evaluation of the jobs was 
estimated in ways. The first was the 
traditional correlational approach in which 


two 
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the ratings given by the one member of each 
conference were compared with the ratings of 
the other individual in the conference. The 
second approach was a comparison of com- 
ponents of the rating variance. For this pur- 
pose the ratio of the variance of the true 
scores to the variance of the obtained scores 
for the population was computed as described 
by Lindquist (1956, p. 361). These reliability 
estimates represent the percentage of variance 
which is not variance. 
Only variance resulting from differences be- 
tween jobs was considered as nonerror vari- 
ance; all other variation 


considered error 


as 


was considered as 


error. This results in an estimate of reliability 
in which any differences between raters are 
Both estimates 
of consistency of the ratings for each factor 
and for the total score are presented in Table 
3. Since job evaluation committees usually 


considered as error variance 


consist of about five individuals this number 
was used to estimate the consistency which 
would be obtained in an operational setting. 
The correlational estimates of consistency are 
fairly high and are typical of such measure- 
ments of interrater agreement. The compo- 
nents of variance estimates were lower and 
more spread out. 

These differences between estimates of reli- 
ability based upon different approaches were 
mentioned in the preceding discussion of the 
interrelationships between the factors. Such 
differences make it difficult to quantify the 
amount of unique reliable variance remaining 
in the factors. Here, the greatest agreement 
was obtained for those factors having a more 
objective basis in reality, e.g., Working Con- 
ditions, Physical Effort, and Responsibility 
for Safety; very little agreement was found 
for the more intangible factors dealing with 
supervision or leadership. 


SUMMARY AND CONCLUSIONS 


The Air Force’s job evaluation system was 
applied to a sample of jobs. The results were 
analyzed to determine: (a) differences be- 
tween evaluations based upon consensus of 
two-man conferences and evaluations based 
upon the average of their independent judg- 


Evaluation System 


ments, () interrelationship between factors 
and (c) reliability of the evaluations. 

It was shown that a simple averaging of 
individual ratings very closely approximates 
consensus ratings derived from discussion be- 
tween two judges. This finding gives rise to 
questions about the need for judges to meet 
and attempt to agree on consensus ratings. 

While only six of the factors made signifi- 
cant independent contributions to the total 
score, only a small amount of variance in each 
factor was predictable from the other factors. 
These findings suggest that it may not always 
be advisable to seek to reduce the number of 
factors used in the present job evaluation plan. 

Two measures of reliability were computed. 
The correlational approach provided fairly 
high values, while estimates based on compo- 
nents of variance were lower. The latter esti- 
mate is considered more meaningful. The dis- 
crepancy between the estimates illustrates the 
difficulty of computing the amount of unique 
reliable variance in each factor. 
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The purpose of this research is to investi- 
gate the possibility of predicting success in 
supervisory training programs through better 
selection techniques. The type of supervisory 
training utilized in this study is the “em- 
ployee or person centered” program. In this 
type of training the supervisor is helped to 
understand both the problem and the indi- 
viduals concerned. 

In this research, the first five weeks of the 
program consisted of an introduction to the 
factors important in understanding human 
relations and general human behavior. This 
refers to such things as motivation, percep- 
tion, attitudes, frustration, reaction to frus- 
tration, and other personality variables. The 
second five weeks of the training program 
dealt with research conducted in human rela- 
tions and supervisory methods in various in- 
dustrial and business organizations on various 
managerial levels. The student was thus given 
a general understanding of what is meant by 
human relations. The last five weeks of the 
course consisted of the application by the 
student of this knowledge to problems and 
cases presented by the instructor. During this 
time the student analyzed cases individually, 
discussed cases in class, participated in role 
playing, and presented problems of his own 
for class discussion. In general, the last por- 
tion of the course dealt entirely with appli- 
cation. 

Three tests were selected as a means of 
measuring aspects of the potential super- 
visor: (a) the Wonderlic Personnel Test 
(Wonderlic, 1945); (&) the How Supervise? 
Scale (File & Remmers, 1948); (c) the F 
Scale (a measure of Authoritarian Person- 
ality) (Adorno, Frenkel-Brunswick, Levinson, 
& Sanford, 1950). 

The sample consisted of the students en- 
rolled in a supervisory training course, given 

1 Now at Long Island University. 

2Now at the University of Washington. 


for 3 semester hours’ credit, in the Evening 
Division of the University of Kansas City. 
Approximately 75% of the class were from 
the business world in supervisory or middle- 
management positions. The other 25% 
sisted of individuals from the business world 
who were not in supervisory positions and a 
sprinkling of full-time students majoring it 
business administration or industrial psychol- 
ogy. About one-third of the class were having 
their tuition paid by their employers. 

On the first class meeting the students were 
given the three tests, the Wonderlic, How 
Supervise?, and the F Scale.* The tests were 
administered and scored by a graduate fellow 
in the psychology department, so that the in- 
structor did not see the individual scores until 
after the final grade for the course had been 
completed. 

The criterion of supervisory 
training was the numerical grade which the 
student received at the end of the course. 
Theoretically, 50% of this numerical grade 
was based on two multiple-choice tests (maxi- 
mum 150 points), one of which was given at 
about Week 8 of the course and one given as 
a part of the final examination. The questions 
came from both the lecture and the textbook 
(Smith, 1955). The items were related to the 
course content and application of this content 
to supervisory problems. At the end of Week 
6, Week 12, and during the final examination 
the students were given a case to analyze. 
These cases together (maximum 150 points) 
theoretically accounted for the other 50% of 


8 The F Scale was modified from a six-point scalk 
of three “agrees” and three “disagrees” with positive 
and negative scoring to a five-point scale: (a) 
strongly disagree, (b) disagree, (c) neither agree 
nor disagree, (d) agree, (e) strongly agree. Also the 
question concerning prewar authorities in Germany 
(No. 13 -on original scale) was removed as no longer 
applicable. Thus the scale had 28 questions, with a 
minimum score of 28 and a maximum score of 140. 
Pretest research showed correlations in the high 90s 
between this scale and the original six-point scale. 
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the student’s grade. In actuality the students 
did better on the objective tests than on the 
cases so that the objective tests carried 
slightly more weight in the total criterion. 
Each of the first two cases were graded, re- 
turned, and discussed in class before the final 
examination, as were several other cases 
problems. 

The statistical analysis consisted of the 
computation of rank difference correlations, 
between the criterion and each of the predic- 
tive tests and between the predictive tests 
Further analysis included computation of 
partial and multiple correlations between the 
predictive tests and the total criterion, anc 
between the predictive tests and the final 
case as the criterion. Also a cut-off score was 
empirically determined for the F Scale and 
the How Supervise? Scale, both independently 
and combined, to give the highest level of 
successful prediction and the smallest amount 
of selection and rejection error. 

In analyzing the data a number of in- 
teresting results appear. First, the psychologi- 
cal tests from which predictions are being 
made are relatively independent of each other 
There is only one correlation (Table 1), that 
between the How Supervise? and the Wonder- 
lic, which even approaches statistical sig- 
nificance (p < .10). 

On the first case, which was given during 
Week 6 of the course, after the students had 
received five weeks of basic training in psy- 
chology with no class experience in the ap- 
plication of this material, only the Wonderlic 
seemed to have predictive significance (rho 
.39). After this case the importance of the 
Wonderlic seems to disappear. Of course this 
is not an unselected population. It is probably 
safe to assume that in order to be admitted 
to the university the class members passed 
entrance examinations or the equivalent, plac- 
ing them in at least;the upper-half of the 
population in terms of intelligence. Further, 
many had been preselected by the employer 
for this particular kind of training. However, 
no one was selected by psychological tests 
other than intelligence tests. 

The How Supervise? Scale predicted best 
on the total criterion (rho = .69). It also pre- 
dicted best on the objective tests (rho = .67) 
and on the total cases (rho = .58). The F 
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TABLE 1 


loraL SCORE AS CRITERION 


Multiple correlation matria 
Variable 


. How Supervise 
F Scale* 
Wonderlic 


Total score 


= 32, rho of .349 = 


cond-Order Partial Multiple 


Correlations Correlations 


*On the F 
the high 
rho was assigned to lowest s 


all other variables Rank 1 w 


itarian personalit 
score is me t n tl " " ank 1 f 


Scale, on the other hand, predicted best on 
the final case (rho = .54). 

In Table 1 the partial correlation between 
the How Supervise? and the total criterion 
with the F Scale and the Wonderlic partialed 
out is .67, the highest partial in the matrix. 

With a multiple correlation of .77 between 
psychological tests and the total criteria, the 
tests appear to predict 59% of the variation 
in the students’ grades for this course. 

Any multiple correlation study, of course, 
needs cross-validation before it can be com- 
pletely accepted. While no true cross-valida- 
tion has been attempted for this study, there 
is some verification from previous research in 
the same kind of supervisory training course 
during an earlier semester. The correlations 
received on the F Scale and the How Super- 
vise? Scale with the criteria showed the same 
general result with a somewhat lower multiple 
correlation (R = .61). A small part of this 
difference was probably due to the absence of 
the Wonderlic in that matrix. 

As a further example of the usefulness of 
these tests for predictive purposes the group 
is divided at the mean score on the total 
criterion and this is used as a measure of 
training success, with those scoring above 
the mean being considered successful. It is 
found that by empirically determining a cut- 
off score of 52 on the How Supervise? Scale 
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there are 16 people scoring 52 or above who 
are also above the mean of the criterion, 
whereas only 2 score below the mean. Failure 
is predicted for 17 people, and there are 3 
people scoring above the criterion mean and 
14 scoring below. With this cut-off score the 
error of selection and rejection on the How 
Supervise? Scale is only 14%. 

Using an empirically determined F Scale 
cut-off score of 72 for predicting success, 17 
of the 26 people for whom success is predicted 
are above the mean on the criterion. When 
failure is predicted for the remaining nine, 
seven are below the criterion mean. Thus the 
selection and rejection error is 31%. 

By combining the two tests and the two cut- 
off scores, the tests correctly predict success 
for 14 people out of 15. Of those for whom 
failure is predicted, 5 are rated successful and 
15 are rated unsuccessful. Thus, 93% of those 
for whom success is predicted by the multiple 
cut-offs fulfill the criterion of success. Of 
those for whom failure is predicted by the 


multiple cut-offs only 25% are successful. 


In generalizing about the predictive value 
of the three tests, it appears that without 
specialized supervisory training intelligence 


seems to be the primary factor in application 
of knowiedge to supervisory problems. The 
How Supervise? Scale appears not to be an 
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achievement test as it was originally con- 
ceived, but is to a large extent an attitude 
or aptitude test. It predicts how much one can 
learn about supervision rather than what one 
already knows, whereas the F Scale is a more 
basic aptitude test which predicts better, after 
completion of training, how well one can 
apply knowledge to a supervisory problem. It 
can be concluded that in this preselected 
homogeneous group the use of the How Super- 
vise? Scale and the F Scale predicts with a 
high degree of accuracy those who are able 
to successfully complete a supervisory train- 
ing course. One cannot generalize this data 
beyond the type of training program on which 
this research was conducted. It is possible 
that these tests are predictive only in a course 
with the same goals, content, and training 
techniques used for this research. 
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